09 spring-boot-acurator 定时检测 redis 集群导致 “IOException: Too many open files“

前言

问题的现象主要是如下 

项目刚启动的时候 十分正常, 然后 随着时间的推移, 比如说 项目跑了 四五天之后 

项目 突然出现问题, 一部分服务能够正常访问, 一部分服务抛出异常

异常信息 就是 too many files 

这里的主要的问题是 在异常之前, redis 集群没有密码, 然后 某一天 redis 集群增加了密码之后, 就出现了 上述的问题

当然 这个 也是 从后面的情况中 推导出去的

异常信息 如下

java.io.IOException: Too many open files
        at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
        at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
        at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
        at org.apache.tomcat.util.net.NioEndpoint.serverSocketAccept(NioEndpoint.java:450)
        at org.apache.tomcat.util.net.NioEndpoint.serverSocketAccept(NioEndpoint.java:73)
        at org.apache.tomcat.util.net.Acceptor.run(Acceptor.java:95)
        at java.lang.Thread.run(Thread.java:748)
2023-06-16 11:03:00.354 [http-nio-8291-Acceptor-0] ERROR org.apache.tomcat.util.net.Acceptor - Socket accept failed
java.io.IOException: Too many open files
        at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
        at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
        at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
        at org.apache.tomcat.util.net.NioEndpoint.serverSocketAccept(NioEndpoint.java:450)
        at org.apache.tomcat.util.net.NioEndpoint.serverSocketAccept(NioEndpoint.java:73)
        at org.apache.tomcat.util.net.Acceptor.run(Acceptor.java:95)
        at java.lang.Thread.run(Thread.java:748)

异常出现了 too many files, 一般的处理方式就是 查看项目 是否有那么多 需要打开的 文件, 网络连接, socket, pipe 等等 

如果项目确实 需要打开那么多的 FileDescriptor, 那么 调整 linux 的 open fields 的数量即可 

但是 如果是 大量的异常情况的 FileDescriptor, 那么 就需要排查对应的问题了

我们这里的情况就是 大量的 异常的 FileDescriptor

问题的调试

首先看一下 现场情况

查看 当前进程的所有 fd 的信息如下, 可以看到 大量的 socket 的 FileDescriptor

root@ubuntu:~# ll /proc/17843/fd | grep socket
lrwx------ 1 root root 64 Jun 21 09:13 14 -> socket:[1092572]
lrwx------ 1 root root 64 Jun 21 09:13 158 -> socket:[1091360]
lrwx------ 1 root root 64 Jun 21 09:13 183 -> socket:[1091361]
lrwx------ 1 root root 64 Jun 21 09:13 208 -> socket:[1091362]
lrwx------ 1 root root 64 Jun 21 09:13 209 -> socket:[1091363]
lrwx------ 1 root root 64 Jun 21 09:13 210 -> socket:[1091364]
lrwx------ 1 root root 64 Jun 21 09:13 211 -> socket:[1091365]
lrwx------ 1 root root 64 Jun 21 09:13 212 -> socket:[1098021]
lrwx------ 1 root root 64 Jun 21 09:13 213 -> socket:[1091372]
lrwx------ 1 root root 64 Jun 21 09:13 214 -> socket:[1091373]
lrwx------ 1 root root 64 Jun 21 09:13 215 -> socket:[1091374]
lrwx------ 1 root root 64 Jun 21 09:13 216 -> socket:[1093391]
lrwx------ 1 root root 64 Jun 21 09:13 217 -> socket:[1093392]
lrwx------ 1 root root 64 Jun 21 09:13 218 -> socket:[1094944]
lrwx------ 1 root root 64 Jun 21 09:13 219 -> socket:[1091375]
lrwx------ 1 root root 64 Jun 21 09:13 220 -> socket:[1091376]
lrwx------ 1 root root 64 Jun 21 09:13 221 -> socket:[1091377]
lrwx------ 1 root root 64 Jun 21 09:13 222 -> socket:[1091378]
lrwx------ 1 root root 64 Jun 21 09:13 223 -> socket:[1091379]
lrwx------ 1 root root 64 Jun 21 09:13 224 -> socket:[1091380]
lrwx------ 1 root root 64 Jun 21 09:13 225 -> socket:[1091389]
lrwx------ 1 root root 64 Jun 21 09:13 226 -> socket:[1091390]
lrwx------ 1 root root 64 Jun 21 09:13 227 -> socket:[1091391]
lrwx------ 1 root root 64 Jun 21 09:13 228 -> socket:[1091392]
lrwx------ 1 root root 64 Jun 21 09:13 229 -> socket:[1091393]
lrwx------ 1 root root 64 Jun 21 09:13 230 -> socket:[1091394]
lrwx------ 1 root root 64 Jun 21 09:13 231 -> socket:[1091401]
lrwx------ 1 root root 64 Jun 21 09:13 232 -> socket:[1094176]
lrwx------ 1 root root 64 Jun 21 09:13 233 -> socket:[1094181]
lrwx------ 1 root root 64 Jun 21 09:13 234 -> socket:[1094238]
lrwx------ 1 root root 64 Jun 21 09:13 235 -> socket:[1094247]
lrwx------ 1 root root 64 Jun 21 09:13 236 -> socket:[1094248]
lrwx------ 1 root root 64 Jun 21 09:13 237 -> socket:[1095491]
lrwx------ 1 root root 64 Jun 21 09:13 238 -> socket:[1095494]
lrwx------ 1 root root 64 Jun 21 09:13 239 -> socket:[1095495]
lrwx------ 1 root root 64 Jun 21 09:13 240 -> socket:[1095496]
lrwx------ 1 root root 64 Jun 21 09:13 241 -> socket:[1095497]
lrwx------ 1 root root 64 Jun 21 09:13 242 -> socket:[1095498]
lrwx------ 1 root root 64 Jun 21 09:13 243 -> socket:[1091535]
lrwx------ 1 root root 64 Jun 21 09:13 244 -> socket:[1091536]
lrwx------ 1 root root 64 Jun 21 09:13 245 -> socket:[1091537]
lrwx------ 1 root root 64 Jun 21 09:13 246 -> socket:[1091538]
lrwx------ 1 root root 64 Jun 21 09:13 247 -> socket:[1091539]
lrwx------ 1 root root 64 Jun 21 09:13 248 -> socket:[1091540]
lrwx------ 1 root root 64 Jun 21 09:13 249 -> socket:[1098022]
lrwx------ 1 root root 64 Jun 21 09:13 25 -> socket:[1092819]
lrwx------ 1 root root 64 Jun 21 09:13 250 -> socket:[1098023]
lrwx------ 1 root root 64 Jun 21 09:13 251 -> socket:[1098024]
lrwx------ 1 root root 64 Jun 21 09:13 252 -> socket:[1098025]
lrwx------ 1 root root 64 Jun 21 09:13 253 -> socket:[1098026]
lrwx------ 1 root root 64 Jun 21 09:13 254 -> socket:[1110159]
lrwx------ 1 root root 64 Jun 21 09:13 255 -> socket:[1098782]
lrwx------ 1 root root 64 Jun 21 09:13 256 -> socket:[1098783]
lrwx------ 1 root root 64 Jun 21 09:13 257 -> socket:[1095991]
lrwx------ 1 root root 64 Jun 21 09:13 258 -> socket:[1095992]
lrwx------ 1 root root 64 Jun 21 09:13 259 -> socket:[1095993]
lrwx------ 1 root root 64 Jun 21 09:13 26 -> socket:[1092821]
lrwx------ 1 root root 64 Jun 21 09:13 260 -> socket:[1095994]
lrwx------ 1 root root 64 Jun 21 09:26 261 -> socket:[1101540]
lrwx------ 1 root root 64 Jun 21 09:26 262 -> socket:[1101541]
lrwx------ 1 root root 64 Jun 21 09:26 263 -> socket:[1101542]
lrwx------ 1 root root 64 Jun 21 09:26 264 -> socket:[1101543]
lrwx------ 1 root root 64 Jun 21 09:26 265 -> socket:[1101544]
lrwx------ 1 root root 64 Jun 21 09:26 266 -> socket:[1101545]
lrwx------ 1 root root 64 Jun 21 09:26 267 -> socket:[1104166]
lrwx------ 1 root root 64 Jun 21 09:26 268 -> socket:[1104167]
lrwx------ 1 root root 64 Jun 21 09:26 269 -> socket:[1104168]
lrwx------ 1 root root 64 Jun 21 09:26 270 -> socket:[1104169]
lrwx------ 1 root root 64 Jun 21 09:26 271 -> socket:[1104170]
lrwx------ 1 root root 64 Jun 21 09:26 272 -> socket:[1104171]

然后 ss 查看一下当前进程的 socket 的信息, 这里可以看到 具体的主机端口信息, 才能看到大概的 FileDescriptor 的情况 

然后 这里可以明确的就是 大量的 异常的和 redis 集群保持连接的 socket 未正常关闭 

root@ubuntu:~/docker/ay-resource-portals# ss -tiap | grep 17843
ESTAB      0      0      192.168.220.140:5005                 192.168.220.1:62199                 users:(("java",pid=17843,fd=5))
LISTEN     0      100       :::8291                    :::*                     users:(("java",pid=17843,fd=35))
ESTAB      0      0         ::ffff:192.168.220.140:52566                   ::ffff:192.168.220.133:afs3-callback         users:(("java",pid=17843,fd=261))
ESTAB      0      0         ::ffff:192.168.220.140:58626                   ::ffff:192.168.220.133:afs3-vlserver         users:(("java",pid=17843,fd=239))
ESTAB      0      0         ::ffff:192.168.220.140:58726                   ::ffff:192.168.220.133:afs3-vlserver         users:(("java",pid=17843,fd=257))
ESTAB      0      0         ::ffff:192.168.220.140:60864                   ::ffff:192.168.220.133:afs3-volser           users:(("java",pid=17843,fd=235))
ESTAB      0      0         ::ffff:192.168.220.140:56576                   ::ffff:192.168.220.133:afs3-errors           users:(("java",pid=17843,fd=248))
ESTAB      0      0         ::ffff:192.168.220.140:56392                   ::ffff:192.168.220.133:afs3-errors           users:(("java",pid=17843,fd=211))
ESTAB      0      0         ::ffff:192.168.220.140:58658                   ::ffff:192.168.220.133:afs3-vlserver         users:(("java",pid=17843,fd=245))
ESTAB      0      0         ::ffff:192.168.220.140:56784                   ::ffff:192.168.220.133:afs3-errors           users:(("java",pid=17843,fd=272))
ESTAB      0      0         ::ffff:192.168.220.140:32840                   ::ffff:192.168.220.133:afs3-volser           users:(("java",pid=17843,fd=265))
ESTAB      0      0         ::ffff:192.168.220.140:52478                   ::ffff:192.168.220.133:afs3-callback         users:(("java",pid=17843,fd=255))
ESTAB      0      0         ::ffff:192.168.220.140:40552                   ::ffff:192.168.220.133:afs3-prserver         users:(("java",pid=17843,fd=226))
ESTAB      0      0         ::ffff:192.168.220.140:56614                   ::ffff:192.168.220.133:afs3-errors           users:(("java",pid=17843,fd=253))
ESTAB      0      0         ::ffff:192.168.220.140:40590                   ::ffff:192.168.220.133:afs3-prserver         users:(("java",pid=17843,fd=232))
ESTAB      0      0         ::ffff:192.168.220.140:46740                   ::ffff:192.168.220.133:afs3-kaserver         users:(("java",pid=17843,fd=228))
ESTAB      0      0         ::ffff:192.168.220.140:52340                   ::ffff:192.168.220.133:afs3-callback         users:(("java",pid=17843,fd=231))
ESTAB      0      0         ::ffff:192.168.220.140:52312                   ::ffff:192.168.220.133:afs3-callback         users:(("java",pid=17843,fd=225))
ESTAB      0      0         ::ffff:192.168.220.140:56522                   ::ffff:192.168.220.133:afs3-errors           users:(("java",pid=17843,fd=236))
ESTAB      0      0         ::ffff:192.168.220.140:56734                   ::ffff:192.168.220.133:afs3-errors           users:(("java",pid=17843,fd=266))
ESTAB      0      0         ::ffff:192.168.220.140:35912                   ::ffff:192.168.220.133:27017                 users:(("java",pid=17843,fd=30))
ESTAB      0      0         ::ffff:192.168.220.140:60768                   ::ffff:192.168.220.133:afs3-volser           users:(("java",pid=17843,fd=217))
ESTAB      0      0         ::ffff:192.168.220.140:46834                   ::ffff:192.168.220.133:afs3-kaserver         users:(("java",pid=17843,fd=246))
ESTAB      0      0         ::ffff:192.168.220.140:58534                   ::ffff:192.168.220.133:afs3-vlserver         users:(("java",pid=17843,fd=221))
ESTAB      0      0         ::ffff:192.168.220.140:35938                   ::ffff:192.168.220.133:27017                 users:(("java",pid=17843,fd=49))
ESTAB      0      0         ::ffff:192.168.220.140:40688                   ::ffff:192.168.220.133:afs3-prserver         users:(("java",pid=17843,fd=249))
ESTAB      0      0         ::ffff:192.168.220.140:46906                   ::ffff:192.168.220.133:afs3-kaserver         users:(("java",pid=17843,fd=258))
ESTAB      0      0         ::ffff:192.168.220.140:40806                   ::ffff:192.168.220.133:afs3-prserver         users:(("java",pid=17843,fd=262))
ESTAB      0      0         ::ffff:192.168.220.140:56428                   ::ffff:192.168.220.133:afs3-errors           users:(("java",pid=17843,fd=218))
ESTAB      0      0         ::ffff:192.168.220.140:58468                   ::ffff:192.168.220.133:afs3-vlserver         users:(("java",pid=17843,fd=208))
ESTAB      0      0         ::ffff:192.168.220.140:33660                ::ffff:10.30.2.25:mysql                 users:(("java",pid=17843,fd=33))
ESTAB      0      0         ::ffff:192.168.220.140:52616                   ::ffff:192.168.220.133:afs3-callback         users:(("java",pid=17843,fd=267))
ESTAB      0      0         ::ffff:192.168.220.140:36766                   ::ffff:10.60.50.16:8848                  users:(("java",pid=17843,fd=26))
ESTAB      0      0         ::ffff:192.168.220.140:56454                   ::ffff:192.168.220.133:afs3-errors           users:(("java",pid=17843,fd=224))
ESTAB      0      0         ::ffff:192.168.220.140:37262                   ::ffff:10.60.50.16:8848                  users:(("java",pid=17843,fd=254))
ESTAB      0      0         ::ffff:192.168.220.140:40718                   ::ffff:192.168.220.133:afs3-prserver         users:(("java",pid=17843,fd=256))
ESTAB      0      0         ::ffff:192.168.220.140:58516                   ::ffff:192.168.220.133:afs3-vlserver         users:(("java",pid=17843,fd=215))
ESTAB      0      0         ::ffff:192.168.220.140:60822                   ::ffff:192.168.220.133:afs3-volser           users:(("java",pid=17843,fd=229))
ESTAB      0      0         ::ffff:192.168.220.140:60918                   ::ffff:192.168.220.133:afs3-volser           users:(("java",pid=17843,fd=247))
ESTAB      0      0         ::ffff:192.168.220.140:46992                   ::ffff:192.168.220.133:afs3-kaserver         users:(("java",pid=17843,fd=264))
ESTAB      0      0         ::ffff:192.168.220.140:46782                   ::ffff:192.168.220.133:afs3-kaserver         users:(("java",pid=17843,fd=234))
ESTAB      0      0         ::ffff:192.168.220.140:60988                   ::ffff:192.168.220.133:afs3-volser           users:(("java",pid=17843,fd=259))
ESTAB      0      0         ::ffff:192.168.220.140:58818                   ::ffff:192.168.220.133:afs3-vlserver         users:(("java",pid=17843,fd=263))
ESTAB      0      0         ::ffff:192.168.220.140:58698                   ::ffff:192.168.220.133:afs3-vlserver         users:(("java",pid=17843,fd=250))
ESTAB      0      0         ::ffff:192.168.220.140:60734                   ::ffff:192.168.220.133:afs3-volser           users:(("java",pid=17843,fd=210))
ESTAB      0      0         ::ffff:192.168.220.140:52258                   ::ffff:192.168.220.133:afs3-callback         users:(("java",pid=17843,fd=213))
ESTAB      0      0         ::ffff:192.168.220.140:40856                   ::ffff:192.168.220.133:afs3-prserver         users:(("java",pid=17843,fd=268))
ESTAB      0      0         ::ffff:192.168.220.140:46808                   ::ffff:192.168.220.133:afs3-kaserver         users:(("java",pid=17843,fd=240))
ESTAB      0      0         ::ffff:192.168.220.140:58602                   ::ffff:192.168.220.133:afs3-vlserver         users:(("java",pid=17843,fd=233))
ESTAB      0      0         ::ffff:192.168.220.140:52380                   ::ffff:192.168.220.133:afs3-callback         users:(("java",pid=17843,fd=237))
ESTAB      0      0         ::ffff:192.168.220.140:52230                   ::ffff:192.168.220.133:afs3-callback         users:(("java",pid=17843,fd=158))
ESTAB      0      0         ::ffff:192.168.220.140:46714                   ::ffff:192.168.220.133:afs3-kaserver         users:(("java",pid=17843,fd=222))
ESTAB      0      0         ::ffff:192.168.220.140:60796                   ::ffff:192.168.220.133:afs3-volser           users:(("java",pid=17843,fd=223))
ESTAB      0      0         ::ffff:192.168.220.140:60960                   ::ffff:192.168.220.133:afs3-volser           users:(("java",pid=17843,fd=252))
ESTAB      0      0         ::ffff:192.168.220.140:52448                   ::ffff:192.168.220.133:afs3-callback         users:(("java",pid=17843,fd=212))
ESTAB      0      0         ::ffff:192.168.220.140:47046                   ::ffff:192.168.220.133:afs3-kaserver         users:(("java",pid=17843,fd=270))
ESTAB      0      0         ::ffff:192.168.220.140:32892                   ::ffff:192.168.220.133:afs3-volser           users:(("java",pid=17843,fd=271))
ESTAB      0      0         ::ffff:192.168.220.140:40526                   ::ffff:192.168.220.133:afs3-prserver         users:(("java",pid=17843,fd=220))
ESTAB      0      0         ::ffff:192.168.220.140:46874                   ::ffff:192.168.220.133:afs3-kaserver         users:(("java",pid=17843,fd=251))
ESTAB      0      0         ::ffff:192.168.220.140:46690                   ::ffff:192.168.220.133:afs3-kaserver         users:(("java",pid=17843,fd=216))
ESTAB      0      0         ::ffff:192.168.220.140:46648                   ::ffff:192.168.220.133:afs3-kaserver         users:(("java",pid=17843,fd=209))
ESTAB      0      0         ::ffff:192.168.220.140:40622                   ::ffff:192.168.220.133:afs3-prserver         users:(("java",pid=17843,fd=238))
ESTAB      0      0         ::ffff:192.168.220.140:56480                   ::ffff:192.168.220.133:afs3-errors           users:(("java",pid=17843,fd=230))
ESTAB      0      0         ::ffff:192.168.220.140:58864                   ::ffff:192.168.220.133:afs3-vlserver         users:(("java",pid=17843,fd=269))
ESTAB      0      0         ::ffff:192.168.220.140:40498                   ::ffff:192.168.220.133:afs3-prserver         users:(("java",pid=17843,fd=214))
ESTAB      0      0         ::ffff:192.168.220.140:56546                   ::ffff:192.168.220.133:afs3-errors           users:(("java",pid=17843,fd=242))
ESTAB      0      0         ::ffff:192.168.220.140:60892                   ::ffff:192.168.220.133:afs3-volser           users:(("java",pid=17843,fd=241))
ESTAB      0      0         ::ffff:192.168.220.140:40650                   ::ffff:192.168.220.133:afs3-prserver         users:(("java",pid=17843,fd=244))
ESTAB      0      0         ::ffff:192.168.220.140:56646                   ::ffff:192.168.220.133:afs3-errors           users:(("java",pid=17843,fd=260))
ESTAB      0      0         ::ffff:192.168.220.140:58560                   ::ffff:192.168.220.133:afs3-vlserver         users:(("java",pid=17843,fd=227))
ESTAB      0      0         ::ffff:192.168.220.140:52288                   ::ffff:192.168.220.133:afs3-callback         users:(("java",pid=17843,fd=219))
ESTAB      0      0         ::ffff:192.168.220.140:37272                   ::ffff:10.60.50.16:8848                  users:(("java",pid=17843,fd=45))
ESTAB      0      0         ::ffff:192.168.220.140:40466                   ::ffff:192.168.220.133:afs3-prserver         users:(("java",pid=17843,fd=183))
ESTAB      0      0         ::ffff:192.168.220.140:52410                   ::ffff:192.168.220.133:afs3-callback         users:(("java",pid=17843,fd=243))
ESTAB      0      0         ::ffff:192.168.220.140:8291                    ::ffff:192.168.220.140:40466                 users:(("java",pid=17843,fd=46))

重新构造一下问题的现场, 这个还是 蛮花时间的, 集群搭建好了, 密码配置了, 项目启动好了之后, 没有期望的那么多 FileDescriptor 的创建 

然后 调试生产环境的项目情况, 原来这部分 和 redis 创建的连接是来自于 acurator 发送的测试请求 

然后 我们这里构造测试请求, 然后 观察 fd 的相关信息, 可以看到 redis集群 总共六个节点, 每次 请求之后, /proc/$pid/fd 下面增加 六个 FileDescriptor, 并且 长时间没有释放

curl http://192.168.220.140:8291/actuator/health 

至此 问题就基本上 复现出来了, 然后生产环境上面有 spring-boot-admin, 会定时 向各个 微服务节点 发送 /actuator/health 检测状态, 然后 随着 时间的推移, FileDescriptor 越来越多 

完整的请求上下文信息如下, 当前请求是 http://192.168.220.140:8291/actuator/health 

然后 来到 HealthEndpointWebExtension 来处理业务请求 

然后再到 CompositeHealthIndicator 来检测各个类型的指标, 然后我们这里出现问题的是在 检查 redis 的情况 的时候出现的问题

然后是获取 redis 的连接, 检查 健康状况 什么的

具体的和 redis 创建连接的地方 

核心构成 每一次 请求都需要和 redis 创建连接的地方是在这里 

GetNativeConnection 会抛出异常, 然后 this.connection 一直为空

然后在 getNativeConnection 中会每次创建和 redis 的连接

然后并向其发送 cluster nodes, client list 等两个请求, 然后 redis 均响应 “io.lettuce.core.RedisException: Cannot retrieve initial cluster partitions from initial URIs [RedisURI [host='192.168.220.133', port=7001], RedisURI [host='192.168.220.133', port=7002], RedisURI [host='192.168.220.133', port=7003], RedisURI [host='192.168.220.133', port=7004], RedisURI [host='192.168.220.133', port=7005], RedisURI [host='192.168.220.133', port=7006]]”

然后 具体的异常是在这里, 获取 cluster nodes, client list 的时候看到 异常信息, 然后 抛出对应的 异常信息

这两个接口 正常响应出的数据如下

127.0.0.1:7001> cluster nodes
5cc767285fe4c522e7851d7b2610992160b72e23 192.168.220.133:7001@17001 myself,master - 0 1687364372000 1 connected 0-5460
30f24b7878a573dd5696394de7ea20ea4deb4591 192.168.220.133:7004@17004 slave cfe925b9621300694e1019139a2daef55406eb44 0 1687364373033 4 connected
cfe925b9621300694e1019139a2daef55406eb44 192.168.220.133:7003@17003 master - 0 1687364373000 3 connected 10923-16383
96d9911ec437b7fec1100fe901aa6c19fe38989e 192.168.220.133:7006@17006 slave 15dd54b917bd8c4483616e7029f0c7b3c071078b 0 1687364373000 6 connected
15dd54b917bd8c4483616e7029f0c7b3c071078b 192.168.220.133:7002@17002 master - 0 1687364374043 2 connected 5461-10922
373b275b28079e4ef5bcd581e9877edb8496d712 192.168.220.133:7005@17005 slave 5cc767285fe4c522e7851d7b2610992160b72e23 0 1687364375057 5 connected
127.0.0.1:7001> client list
id=15597 addr=192.168.220.140:52478 fd=20 name= age=801 idle=789 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=client
id=16247 addr=192.168.220.140:52616 fd=33 name= age=147 idle=147 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=client
id=16214 addr=127.0.0.1:51118 fd=22 name= age=179 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=26 qbuf-free=32742 obl=0 oll=0 omem=0 events=r cmd=client
id=15336 addr=192.168.220.140:52288 fd=14 name= age=1057 idle=1057 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=client
id=15348 addr=192.168.220.140:52312 fd=15 name= age=1047 idle=1047 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=client
id=15470 addr=192.168.220.140:52410 fd=18 name= age=926 idle=926 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=client
id=16039 addr=192.168.220.140:52566 fd=21 name= age=355 idle=355 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=client
id=15393 addr=192.168.220.140:52340 fd=16 name= age=1004 idle=953 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=client
id=15258 addr=192.168.220.140:52230 fd=12 name= age=1132 idle=1132 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=client
id=15325 addr=192.168.220.140:52258 fd=13 name= age=1067 idle=1062 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=client
id=15450 addr=192.168.220.140:52380 fd=17 name= age=944 idle=944 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=client
id=15575 addr=192.168.220.140:52448 fd=19 name= age=821 idle=821 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=client

假设配置了正确的密码之后 

第一次访问 http://192.168.220.140:8291/actuator/health 此时 this.connection 为 null, getNativeConnection 拿到对应的连接, 然后进行测试 

第二次访问 http://192.168.220.140:8291/actuator/health 此时 this.connection 不为 null, 直接使用上次暂存的 connection

StatefulRedisConnection/EpollSocketChannel/LinuxSocket 为什么没有被回收? 

LinuxSocket 被 EpollSocketChannel 引用 

这整个 EpollSocketChannel 列表被 EpollEventLoop.channels 的一个 Map 引用 

而 EpollEventLoop 的生命周期较长 

StatefulRedisConnection 这里不多赘述, 也是被 某对象 引用

问题异常现场

可以看到的是有 3w+ 个 LinuxSocket, 每一个对应于一个 Socket, 占用一个 FileDescriptor

均是泄露的 Redis 连接 以及相关的数据结构

这些 FileDescriptor 主要是包含了 大量的 fd 为 -1 的 FileDescriptor

1个 sun.nio.ch.ServerSocketChannelImpl 监听端口相关

十几个个 sun.nio.ch.SocketChannelImpl, 主要用于处理 客户端的请求

3个 java.net.SocksSocketImpl 来保持和 mongo 的连接 

实际生产的fd 的情况如下, 大量的 socket 的 fd 

socket 的具体信息如下 

问题在新版本

即使是在 spring-data-redis 2.2.x 版本中 泄露的问题依然存在

在 initialize 的过程中 client.getPartitions 会抛出异常, 然后 initialized 总是 false, 然后每次 /actuator/health 请求都会走 client.getPartitions, 创建了一批 tcp 连接, 但是没有释放 

从代码上面来看 貌似, 目前 这个问题 还是存在

然后查看 当前进程的 fd, 发现 还是在不断泄露

异常的情况如下 

配置非 redis 的其他服务

这个问题的复现 也可以在 可以链接的 tcp 非 redis 服务上面复现

具有一定的通用性, 也具有一定的隐蔽性

不仅会消耗 客户端的链接, 也会 消耗对方服务器的资源

相关推荐

  1. Spring Boot 3中使用 Lettuce RedisTemplate 连接 Redis

    2024-04-11 15:14:03       42 阅读
  2. redis cluster定时备份

    2024-04-11 15:14:03       33 阅读
  3. 一个Spring Boot Admin 监控多个Nacos

    2024-04-11 15:14:03       33 阅读

最近更新

  1. TCP协议是安全的吗?

    2024-04-11 15:14:03       16 阅读
  2. 阿里云服务器执行yum,一直下载docker-ce-stable失败

    2024-04-11 15:14:03       16 阅读
  3. 【Python教程】压缩PDF文件大小

    2024-04-11 15:14:03       15 阅读
  4. 通过文章id递归查询所有评论(xml)

    2024-04-11 15:14:03       18 阅读

热门阅读

  1. 清空nginx缓存并强制刷新

    2024-04-11 15:14:03       10 阅读
  2. Bash 编程精粹:从新手到高手的全面指南之变量

    2024-04-11 15:14:03       14 阅读
  3. [Linux][shell][权限] shell原理简介 + 权限细节笔记

    2024-04-11 15:14:03       13 阅读
  4. 知识碎片随手记-1

    2024-04-11 15:14:03       13 阅读
  5. c# 实现Quartz任务调度

    2024-04-11 15:14:03       15 阅读
  6. MySQL:统计总条数时去重

    2024-04-11 15:14:03       14 阅读
  7. python时间&内存计算

    2024-04-11 15:14:03       12 阅读
  8. 自动驾驶涉及相关的技术

    2024-04-11 15:14:03       14 阅读
  9. 死锁以及如何避免死锁

    2024-04-11 15:14:03       15 阅读
  10. 如何理解JVM

    2024-04-11 15:14:03       14 阅读
  11. Spring之事务底层源码解析

    2024-04-11 15:14:03       13 阅读
  12. CSS 选择器 – 类、名称、子选择器

    2024-04-11 15:14:03       14 阅读
  13. 为什么俗套的电邮“钓鱼”攻击,频频得手

    2024-04-11 15:14:03       14 阅读