我们在Linux RHEL 7.6上有3个Kafka代理(3台Linux计算机)
Kafka版本是2.7.X
代理ID为-1010,1011,1012
从Kafka的描述中我们可以看到以下几点
Topic: __consumer_offsets Partition: 0 Leader: none Replicas: 1011,1010,1012 Isr: 1010
Topic: __consumer_offsets Partition: 1 Leader: 1012 Replicas: 1012,1011,1010 Isr: 1012,1011
Topic: __consumer_offsets Partition: 2 Leader: 1011 Replicas: 1010,1012,1011 Isr: 1011,1012
Topic: __consumer_offsets Partition: 3 Leader: none Replicas: 1011,1012,1010 Isr: 1010
Topic: __consumer_offsets Partition: 4 Leader: 1011 Replicas: 1012,1010,1011 Isr: 1011
Topic: __consumer_offsets Partition: 5 Leader: none Replicas: 1010,1011,1012 Isr: 1010
从Zookeeper cli中,我们可以看到代理id 1010
未定义
[zk: localhost:2181(CONNECTED) 10] ls /brokers/ids
[1011, 1012]
并根据log -state-change.log
我们可以看到
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-6 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-9 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-8 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-11 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-10 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-46 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-45 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-48 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-47 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-49 as the local replica for the partition is in an offline log directory (state.change.logger)
通过ls -ltr,我们可以看到controller.log
和state-change.log
不是从Dec 16
更新
-rwxr-xr-x 1 root kafka 343477146 Dec 16 14:15 controller.log
-rwxr-xr-x 1 root kafka 207911766 Dec 16 14:15 state-change.log
-rw-r--r-- 1 root kafka 68759461 Dec 16 14:15 kafkaServer-gc.log.6.current
-rwxr-xr-x 1 root kafka 6570543 Dec 17 09:42 log-cleaner.log
-rw-r--r-- 1 root kafka 524288242 Dec 20 00:39 server.log.10
-rw-r--r-- 1 root kafka 524289332 Dec 20 01:37 server.log.9
-rw-r--r-- 1 root kafka 524288452 Dec 20 02:35 server.log.8
-rw-r--r-- 1 root kafka 524288625 Dec 20 03:33 server.log.7
-rw-r--r-- 1 root kafka 524288395 Dec 20 04:30 server.log.6
-rw-r--r-- 1 root kafka 524288237 Dec 20 05:27 server.log.5
-rw-r--r-- 1 root kafka 524289136 Dec 20 06:25 server.log.4
-rw-r--r-- 1 root kafka 524288142 Dec 20 07:25 server.log.3
-rw-r--r-- 1 root kafka 524288187 Dec 20 08:21 server.log.2
-rw-r--r-- 1 root kafka 524288094 Dec 20 10:52 server.log.1
-rw-r--r-- 1 root kafka 323361 Dec 20 19:50 kafkaServer-gc.log.0.current
-rw-r--r-- 1 root kafka 323132219 Dec 20 19:50 server.log
-rwxr-xr-x 1 root kafka 15669106 Dec 20 19:50 kafkaServer.out
到目前为止,我们所做是:
我们重启所有3个Zookeeper服务器我们重启所有Kafka经纪人
但Kafka经纪人1010
仍然显示为leader none
,而不是在Zookeeper数据中
其他信息
[zk: localhost:2181(CONNECTED) 11] get /controller
{"version":1,"brokerid":1011,"timestamp":"1640003679634"}
cZxid = 0x4900000b0c
ctime = Mon Dec 20 12:34:39 UTC 2021
mZxid = 0x4900000b0c
mtime = Mon Dec 20 12:34:39 UTC 2021
pZxid = 0x4900000b0c
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x27dd7cf43350080
dataLength = 57
numChildren = 0
来自Kafka01
more meta.properties
#
#Tue Nov 16 07:45:36 UTC 2021
cluster.id=D3KpekCETmaNveBJzE6PZg
version=0
broker.id=1010
相关想法
在主题磁盘中,我们有以下文件(除主题分区外)
-rw-r--r-- 1 root kafka 91 Nov 16 07:45 meta.properties
-rw-r--r-- 1 root kafka 161 Dec 15 16:04 cleaner-offset-checkpoint
-rw-r--r-- 1 root kafka 13010 Dec 15 16:20 replication-offset-checkpoint
-rw-r--r-- 1 root kafka 1928 Dec 17 09:42 recovery-point-offset-checkpoint
-rw-r--r-- 1 root kafka 80 Dec 17 09:42 log-start-offset-checkpoint
任何想法,如果删除一个或多个上述文件可以帮助我们的问题?
1条答案
按热度按时间5uzkadbs1#
All that you've shown is that broker 1010 isn't healthy and you probably have unclear leader election disabled.
ls /brokers/ids
shows you the running, healthy brokers from Zookeeper's point of view.Meanwhile, the data in the
/topics
znodes refers to a broker listed in the replica set that is not running, or at least not reporting back to Zookeeper, which you'd see inserver.log
If you have another broker, you can use the partition reassignment tool to manually remove/change broker 1010 data from each partition of all topics it hosts, which would remove old replica information in Zookeeper, and should force a new leader
You shouldn't delete checkpoint files, but you can delete old, rotated log files after you've determined they're not needed anymore