目录

tlbcc 的个人博客

记录精彩的程序人生

X

Too many file open 问题排查记录

Too many file open 问题排查记录

环境

  • Tocmat 8.0
  • CentOS6.4
  • JDK 8

问题描述

系统部署到云环境后,当天工作正常,第二天就无法正常工作,看了Tomcat日志,日志中频繁出现 Too many file open的报错,重启项目后,可以恢复正常。

问题排查记录

  1. 问题已产生,且未重启Tomcat服务
  2. 查看Tomcat进程的PID
ps -ef | grep tomcat

获取到 TomcatPID用于查看 Tomcat进程的连接句柄

  1. 使用lsof命令,查看了Tomcat进程的连接句柄数
# 按TYPE降序查看连接句柄数量
lsof -n | grep PID | awk '{print $5}' | sort | uniq -c | sort -nr | head -n 10
# 按NAME降序查看连接句柄数量
lsof -n | grep PID | awk '{print $9" "$10}' | sort | uniq -c | sort -nr | head -n 10

查看后发现句柄数量最多的是

  • TYPEsocket
  • NAMEcan't identify protocol

经过度娘之后,得知这种 can't identify protocol的socket句柄,通常是socket流使用后,没有关闭socket流导致,于是开始着手定位代码问题。

  1. 找到问题接口
    找一个同样的环境且没有干扰情况下,需要准备一个小脚本,时刻监控 NAMEcan't identify protocol的连接句柄数量
    test.sh

    #!/bin/sh
    while [ true ]; do
    lsof -n | grep $1 | awk '{print $9" "$10}' | sort | uniq -c | sort -nr | head -n 10
    sleep 1
    done
    
    chmod +x test.sh
    ./test.sh tomcat_pid
    

    启动后就可以每秒刷新一次,监控Tomcat进程 NAMEcan't identify protocol的连接句柄数量。

    开始逐一测试线上接口,定位问题,每调用接口前后,观察NAME can't identify protocol`的连接句柄数量的增长情况,找到问题接口。

  2. 定位代码问题
    找到问题接口后,开启Tomcat远程调试功能
    修改$TOMCAT_HOME/bin/catalina.sh中 JPDA_ADDRESS这一项,将内容改为需要绑定的远程调试端口

     268 if [ "$1" = "jpda" ] ; then
     269   if [ -z "$JPDA_TRANSPORT" ]; then
     270     JPDA_TRANSPORT="dt_socket"
     271   fi
     272   if [ -z "$JPDA_ADDRESS" ]; then
     273     JPDA_ADDRESS="5050"
     274   fi
     275   if [ -z "$JPDA_SUSPEND" ]; then
     276     JPDA_SUSPEND="n"
     277   fi
     278   if [ -z "$JPDA_OPTS" ]; then
     279     JPDA_OPTS="-agentlib:jdwp=transport=$JPDA_TRANSPORT,address=$JPDA_ADDRESS,server=y,suspend=$JPDA_SUSPEND"
     280   fi
     281   CATALINA_OPTS="$CATALINA_OPTS $JPDA_OPTS"
     282   shift
     283 fi
    
    

    修改$TOMCAT_HOME/bin/startup.sh中exec "$PRGDIR"/"$EXECUTABLE" start "$@"改为exec "$PRGDIR"/"$EXECUTABLE" jpda start "$@"

    42 EXECUTABLE=catalina.sh
    43
    44 # Check that target executable exists
    45 if $os400; then
    46   # -x will Only work on the os400 if the files are:
    47   # 1. owned by the user
    48   # 2. owned by the PRIMARY group of the user
    49   # this will not work if the user belongs in secondary groups
    50   eval
    51 else
    52   if [ ! -x "$PRGDIR"/"$EXECUTABLE" ]; then
    53     echo "Cannot find $PRGDIR/$EXECUTABLE"
    54     echo "The file is absent or does not have execute permission"
    55     echo "This file is needed to run this program"
    56     exit 1
    57   fi
    58 fi
    59
    60 exec "$PRGDIR"/"$EXECUTABLE" jpda start "$@"
    
    

    开放防火墙的远程调试端口

    /sbin/iptables -I INPUT -p tcp --dport 5050 -j ACCEPT
    /etc/rc.d/init.d/iptables save
    /etc/rc.d/init.d/iptables restart
    

    重启Tomcat服务

    使用IDEA远程调试Tomcat(需要本地代码与线上代码完全一致,否则不会进入断点)

    请求接口,逐行调试,观察NAME can't identify protocol`的连接句柄数量的增长情况,找到问题代码,并修改


标题:Too many file open 问题排查记录
作者:tlbcc
地址:http://blog.tlbcc.cc/articles/2020/11/21/1605944092956.html