系统部署到云环境后,当天工作正常,第二天就无法正常工作,看了Tomcat日志,日志中频繁出现 Too many file open
的报错,重启项目后,可以恢复正常。
ps -ef | grep tomcat
获取到 Tomcat
的 PID
用于查看 Tomcat
进程的连接句柄
lsof
命令,查看了Tomcat进程的连接句柄数# 按TYPE降序查看连接句柄数量
lsof -n | grep PID | awk '{print $5}' | sort | uniq -c | sort -nr | head -n 10
# 按NAME降序查看连接句柄数量
lsof -n | grep PID | awk '{print $9" "$10}' | sort | uniq -c | sort -nr | head -n 10
查看后发现句柄数量最多的是
TYPE
为socket
NAME
为can't identify protocol
经过度娘之后,得知这种 can't identify protocol
的socket句柄,通常是socket流使用后,没有关闭socket流导致,于是开始着手定位代码问题。
找到问题接口
找一个同样的环境且没有干扰情况下,需要准备一个小脚本,时刻监控 NAME
为 can't identify protocol
的连接句柄数量
test.sh
#!/bin/sh
while [ true ]; do
lsof -n | grep $1 | awk '{print $9" "$10}' | sort | uniq -c | sort -nr | head -n 10
sleep 1
done
chmod +x test.sh
./test.sh tomcat_pid
启动后就可以每秒刷新一次,监控Tomcat进程 NAME
为 can't identify protocol
的连接句柄数量。
开始逐一测试线上接口,定位问题,每调用接口前后,观察NAME 为
can't identify protocol`的连接句柄数量的增长情况,找到问题接口。
定位代码问题
找到问题接口后,开启Tomcat远程调试功能
修改$TOMCAT_HOME/bin/catalina.sh中 JPDA_ADDRESS
这一项,将内容改为需要绑定的远程调试端口
268 if [ "$1" = "jpda" ] ; then
269 if [ -z "$JPDA_TRANSPORT" ]; then
270 JPDA_TRANSPORT="dt_socket"
271 fi
272 if [ -z "$JPDA_ADDRESS" ]; then
273 JPDA_ADDRESS="5050"
274 fi
275 if [ -z "$JPDA_SUSPEND" ]; then
276 JPDA_SUSPEND="n"
277 fi
278 if [ -z "$JPDA_OPTS" ]; then
279 JPDA_OPTS="-agentlib:jdwp=transport=$JPDA_TRANSPORT,address=$JPDA_ADDRESS,server=y,suspend=$JPDA_SUSPEND"
280 fi
281 CATALINA_OPTS="$CATALINA_OPTS $JPDA_OPTS"
282 shift
283 fi
修改$TOMCAT_HOME/bin/startup.sh中exec "$PRGDIR"/"$EXECUTABLE" start "$@"
改为exec "$PRGDIR"/"$EXECUTABLE" jpda start "$@"
42 EXECUTABLE=catalina.sh
43
44 # Check that target executable exists
45 if $os400; then
46 # -x will Only work on the os400 if the files are:
47 # 1. owned by the user
48 # 2. owned by the PRIMARY group of the user
49 # this will not work if the user belongs in secondary groups
50 eval
51 else
52 if [ ! -x "$PRGDIR"/"$EXECUTABLE" ]; then
53 echo "Cannot find $PRGDIR/$EXECUTABLE"
54 echo "The file is absent or does not have execute permission"
55 echo "This file is needed to run this program"
56 exit 1
57 fi
58 fi
59
60 exec "$PRGDIR"/"$EXECUTABLE" jpda start "$@"
开放防火墙的远程调试端口
/sbin/iptables -I INPUT -p tcp --dport 5050 -j ACCEPT
/etc/rc.d/init.d/iptables save
/etc/rc.d/init.d/iptables restart
重启Tomcat服务
使用IDEA远程调试Tomcat(需要本地代码与线上代码完全一致,否则不会进入断点)
请求接口,逐行调试,观察NAME 为
can't identify protocol`的连接句柄数量的增长情况,找到问题代码,并修改