我们有一个2节点的kafka集群部署,有50个主题(3个分区),每个主题的保留期为90天。
我们的服务器有16 gb ram和12个cors cpu。
我们每周都会遇到这些错误:
[2020-04-25 15:46:23,969] ERROR Error while accepting connection (kafka.network.Acceptor)
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
at kafka.network.Acceptor.accept(SocketServer.scala:341)
at kafka.network.Acceptor.run(SocketServer.scala:284)
at java.lang.Thread.run(Thread.java:748)
[2020-04-25 15:46:23,969] ERROR Error while accepting connection (kafka.network.Acceptor)
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
at kafka.network.Acceptor.accept(SocketServer.scala:341)
at kafka.network.Acceptor.run(SocketServer.scala:284)
at java.lang.Thread.run(Thread.java:748)
[2020-04-25 15:46:23,969] ERROR Error while accepting connection (kafka.network.Acceptor)
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
at kafka.network.Acceptor.accept(SocketServer.scala:341)
at kafka.network.Acceptor.run(SocketServer.scala:284)
at java.lang.Thread.run(Thread.java:748)
[2020-04-25 15:46:23,969] ERROR Error while accepting connection (kafka.network.Acceptor)
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
at kafka.network.Acceptor.accept(SocketServer.scala:341)
at kafka.network.Acceptor.run(SocketServer.scala:284)
at java.lang.Thread.run(Thread.java:748)
[2020-04-25 15:46:23,969] ERROR Error while accepting connection (kafka.network.Acceptor)
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
at kafka.network.Acceptor.accept(SocketServer.scala:341)
at kafka.network.Acceptor.run(SocketServer.scala:284)
at java.lang.Thread.run(Thread.java:748)
[2020-04-25 15:46:23,969] ERROR Error while accepting connection (kafka.network.Acceptor)
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
at kafka.network.Acceptor.accept(SocketServer.scala:341)
at kafka.network.Acceptor.run(SocketServer.scala:284)
at java.lang.Thread.run(Thread.java:748)
[2020-04-25 15:46:23,969] ERROR Error while accepting connection (kafka.network.Acceptor)
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
at kafka.network.Acceptor.accept(SocketServer.scala:341)
at kafka.network.Acceptor.run(SocketServer.scala:284)
at java.lang.Thread.run(Thread.java:748)
[2020-04-25 15:46:23,969] ERROR Error while accepting connection (kafka.network.Acceptor)
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
at kafka.network.Acceptor.accept(SocketServer.scala:341)
at kafka.network.Acceptor.run(SocketServer.scala:284)
at java.lang.Thread.run(Thread.java:748)
[2020-04-25 15:46:23,970] ERROR Error while accepting connection (kafka.network.Acceptor)
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
at kafka.network.Acceptor.accept(SocketServer.scala:341)
at kafka.network.Acceptor.run(SocketServer.scala:284)
at java.lang.Thread.run(Thread.java:748)
[2020-04-25 15:46:23,970] ERROR Error while accepting connection (kafka.network.Acceptor)
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
at kafka.network.Acceptor.accept(SocketServer.scala:341)
at kafka.network.Acceptor.run(SocketServer.scala:284)
at java.lang.Thread.run(Thread.java:748)
[2020-04-25 15:46:23,970] ERROR Error while accepting connection (kafka.network.Acceptor)
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
at kafka.network.Acceptor.accept(SocketServer.scala:341)
at kafka.network.Acceptor.run(SocketServer.scala:284)
at java.lang.Thread.run(Thread.java:748)
[2020-04-25 15:46:23,970] ERROR Error while accepting connection (kafka.network.Acceptor)
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
at kafka.network.Acceptor.accept(SocketServer.scala:341)
at kafka.network.Acceptor.run(SocketServer.scala:284)
at java.lang.Thread.run(Thread.java:748)
打开文件的计数为761092
输出 ulimit -a | grep "open files"
地址:999999
另外,我在/etc/security/limits.conf中添加了这一行:
kafka soft nproc 999999
* soft nproc 999999
* hard nproc 999999
* soft nofile 999999
* hard nofile 999999
并在/etc/sysctl.conf中添加了这一行: fs.file-max = 4097152
我们需要在群集中加入更多的机器吗?或者增加ram或cpu等资源?或者。。。
暂无答案!
目前还没有任何答案,快来回答吧!