我们集群里有3台Kafka机器,
Kafka版本-0.10.0.2.6,
和3个zookeeper服务器版本-3.4.6
我们有一个问题,其中一个kafka代理无法启动,这似乎是因为损坏的索引文件
我们注意到,kafka日志(/var/log/kafka/server.log)在每台kafka机器上显示了大约数千个损坏的索引文件,如下所示
server.log中的示例
[2019-02-25 12:34:44,907] INFO Completed load of log topic.pop.control.gtp.enrichment-38 with 14 log segments and log end offset 200458117 in 1583 ms (kafka.log.Log)
[2019-02-25 12:34:45,044] WARN Found a corrupted index file due to requirement failed: Corrupt index found, index file (/var/kafka/kafka-logs/topic.pop.control.gtp.state-50/00000000000008068079.index) has non-zero size but the last offset is 8068079 which is no larger than the base offset 8068079.}. deleting /var/kafka/kafka-logs/topic.pop.control.gtp.state-50/00000000000008068079.timeindex, /var/kafka/kafka-logs/topic.pop.control.gtp.state-50/00000000000008068079.index and rebuilding index... (kafka.log.Log)
[2019-02-25 12:34:45,217] INFO Recovering unflushed segment 8068079 in log topic.pop.control.gtp.state-50. (kafka.log.Log)
[2019-02-25 12:34:45,255] INFO Completed load of log topic.pop.control.gtp.state-50 with 6 log segments and log end offset 8095839 in 347 ms (kafka.log.Log)
[2019-02-25 12:34:45,261] WARN Found a corrupted index file due to requirement failed: Corrupt index found, index file (/var/kafka/kafka-logs/topic.pop.pri.wnr-38/00000000001979940988.index) has non-zero size but the last offset is 1979940988 which is no larger than the base offset 1979940988.}. deleting /var/kafka/kafka-logs/topic.pop.pri.wnr-38/00000000001979940988.timeindex, /var/kafka/kafka-logs/topic.pop.pri.wnr-38/00000000001979940988.index and rebuilding index... (kafka.log.Log)
[2019-02-25 12:34:47,607] INFO Recovering unflushed segment 1979940988 in log topic.pop.pri.wnr-38. (kafka.log.Log)
[2019-02-25 12:34:48,872] INFO Completed load of log topic.pop.pri.wnr-38 with 21 log segments and log end offset 1980403224 in 3617 ms (kafka.log.Log)
[2019-02-25 12:34:48,935] WARN Found a corrupted index file due to requirement failed: Corrupt index found, index file (/var/kafka/kafka-logs/topic.pop.control.gtp-88/00000000000216947511.index) has non-zero size but the last offset is 216947511 which is no larger than the base offset 216947511.}. deleting /var/kafka/kafka-logs/topic.pop.control.gtp-88/00000000000216947511.timeindex, /var/kafka/kafka-logs/topic.pop.control.gtp-88/00000000000216947511.index and rebuilding index... (kafka.log.Log)
[2019-02-25 12:34:52,436] INFO Recovering unflushed segment 216947511 in log topic.pop.control.gtp-88. (kafka.log.Log)
[2019-02-25 12:34:54,508] INFO Completed load of log topic.pop.control.gtp-88 with 21 log segments and log end offset 217830559 in 5635 ms (kafka.log.Log)
[2019-02-25 12:34:54,531] WARN Found a corrupted index file due to requirement failed: Corrupt index found, index file (/var/kafka/kafka-logs/topic.pop.pri.lop-10/00000000000000000000.index) has non-zero size but the last offset is 0 which is no larger than the base offset 0.}. deleting /var/kafka/kafka-logs/topic.pop.pri.lop-10/00000000000000000000.timeindex, /var/kafka/kafka-logs/topic.pop.pri.lop-10/00000000000000000000.index and rebuilding index... (kafka.log.Log)
[2019-02-25 12:34:57,540] INFO Recovering unflushed segment 0 in log topic.pop.pri.lop-10. (kafka.log.Log)
损坏的索引文件示例
/var/kafka/kafka-logs/topic.pop.control.gtp.state-50/00000000000008068079.index
/var/kafka/kafka-logs/topic.pop.pri.wnr-38/00000000001979940988.index
/var/kafka/kafka-logs/topic.pop.control.gtp-88/00000000000216947511.index
/var/kafka/kafka-logs/topic.pop.pri.lop-10/00000000000000000000.index
删除损坏的索引文件的正确方法是什么?
一种方法是从servcer.log(在每个kafka机器上)中找到损坏的索引文件,并列出一个列表,然后在每个kafka代理上删除它们
rm -f /var/kafka/kafka-logs/topic.pop.control.gtp.state-50/00000000000008068079.index
但是这种方法不能保证log–server.log包含所有损坏的索引文件,所以可能有更多损坏的索引文件没有在日志中提到!那么,如何找到所有被命令或任何其他语法损坏的文件来显示所有损坏的索引文件呢?
我认为,如果我们有这个列表,那么我们可以在bash中创建简单的脚本,它将在列表上运行并自动删除文件
1条答案
按热度按时间bvn4nwqk1#
在启动时,kafka将自动重建所有看起来已损坏的索引文件。您可以在日志行中看到,它显示“重建索引”:
由于要求失败,找到损坏的索引文件:找到损坏的索引,索引文件(/var/kafka/kafka logs/topic.pop.control.gtp.state-50/0000000000000 8068079.index)大小非零,但最后一个偏移量为8068079,不大于基偏移量8068079。}。正在删除/var/kafka/kafka日志/topic.pop.control.gtp.state-50/00000000000008068079.timeindex,/var/kafka/kafka日志/topic.pop.control.gtp.state-50/00000000000008068079.index并重建索引。。。
当kafka没有完全关闭时,通常会得到“损坏”的索引