Kafka +领导者无+和kafka经纪人id未在zookeeper中签名

2g32fytz  于 2022-12-09  发布在  Apache
关注(0)|答案(1)|浏览(166)

我们在Linux RHEL 7.6上有3个Kafka代理(3台Linux计算机)
Kafka版本是2.7.X
代理ID为-1010,1011,1012
从Kafka的描述中我们可以看到以下几点

Topic: __consumer_offsets       Partition: 0    Leader: none    Replicas: 1011,1010,1012        Isr: 1010
        Topic: __consumer_offsets       Partition: 1    Leader: 1012    Replicas: 1012,1011,1010        Isr: 1012,1011
        Topic: __consumer_offsets       Partition: 2    Leader: 1011    Replicas: 1010,1012,1011        Isr: 1011,1012
        Topic: __consumer_offsets       Partition: 3    Leader: none    Replicas: 1011,1012,1010        Isr: 1010
        Topic: __consumer_offsets       Partition: 4    Leader: 1011    Replicas: 1012,1010,1011        Isr: 1011
        Topic: __consumer_offsets       Partition: 5    Leader: none    Replicas: 1010,1011,1012        Isr: 1010

从Zookeeper cli中,我们可以看到代理id 1010未定义

[zk: localhost:2181(CONNECTED) 10] ls /brokers/ids
[1011, 1012]

并根据log -state-change.log
我们可以看到

[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-6 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-9 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-8 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-11 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-10 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-46 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-45 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-48 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-47 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-49 as the local replica for the partition is in an offline log directory (state.change.logger)

通过ls -ltr,我们可以看到controller.logstate-change.log不是从Dec 16更新

-rwxr-xr-x 1 root kafka 343477146 Dec 16 14:15 controller.log
-rwxr-xr-x 1 root kafka 207911766 Dec 16 14:15 state-change.log
-rw-r--r-- 1 root kafka  68759461 Dec 16 14:15 kafkaServer-gc.log.6.current
-rwxr-xr-x 1 root kafka   6570543 Dec 17 09:42 log-cleaner.log
-rw-r--r-- 1 root kafka 524288242 Dec 20 00:39 server.log.10
-rw-r--r-- 1 root kafka 524289332 Dec 20 01:37 server.log.9
-rw-r--r-- 1 root kafka 524288452 Dec 20 02:35 server.log.8
-rw-r--r-- 1 root kafka 524288625 Dec 20 03:33 server.log.7
-rw-r--r-- 1 root kafka 524288395 Dec 20 04:30 server.log.6
-rw-r--r-- 1 root kafka 524288237 Dec 20 05:27 server.log.5
-rw-r--r-- 1 root kafka 524289136 Dec 20 06:25 server.log.4
-rw-r--r-- 1 root kafka 524288142 Dec 20 07:25 server.log.3
-rw-r--r-- 1 root kafka 524288187 Dec 20 08:21 server.log.2
-rw-r--r-- 1 root kafka 524288094 Dec 20 10:52 server.log.1
-rw-r--r-- 1 root kafka    323361 Dec 20 19:50 kafkaServer-gc.log.0.current
-rw-r--r-- 1 root kafka 323132219 Dec 20 19:50 server.log
-rwxr-xr-x 1 root kafka  15669106 Dec 20 19:50 kafkaServer.out

到目前为止,我们所做是:
我们重启所有3个Zookeeper服务器我们重启所有Kafka经纪人
但Kafka经纪人1010仍然显示为leader none,而不是在Zookeeper数据中

其他信息

[zk: localhost:2181(CONNECTED) 11] get /controller
{"version":1,"brokerid":1011,"timestamp":"1640003679634"}
cZxid = 0x4900000b0c
ctime = Mon Dec 20 12:34:39 UTC 2021
mZxid = 0x4900000b0c
mtime = Mon Dec 20 12:34:39 UTC 2021
pZxid = 0x4900000b0c
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x27dd7cf43350080
dataLength = 57
numChildren = 0

来自Kafka01

more meta.properties
#
#Tue Nov 16 07:45:36 UTC 2021
cluster.id=D3KpekCETmaNveBJzE6PZg
version=0
broker.id=1010

相关想法

在主题磁盘中,我们有以下文件(除主题分区外)

-rw-r--r-- 1 root kafka    91 Nov 16 07:45 meta.properties
-rw-r--r-- 1 root kafka   161 Dec 15 16:04 cleaner-offset-checkpoint
-rw-r--r-- 1 root kafka 13010 Dec 15 16:20 replication-offset-checkpoint
-rw-r--r-- 1 root kafka  1928 Dec 17 09:42 recovery-point-offset-checkpoint
-rw-r--r-- 1 root kafka    80 Dec 17 09:42 log-start-offset-checkpoint

任何想法,如果删除一个或多个上述文件可以帮助我们的问题?

5uzkadbs

5uzkadbs1#

All that you've shown is that broker 1010 isn't healthy and you probably have unclear leader election disabled.
ls /brokers/ids shows you the running, healthy brokers from Zookeeper's point of view.
Meanwhile, the data in the /topics znodes refers to a broker listed in the replica set that is not running, or at least not reporting back to Zookeeper, which you'd see in server.log
If you have another broker, you can use the partition reassignment tool to manually remove/change broker 1010 data from each partition of all topics it hosts, which would remove old replica information in Zookeeper, and should force a new leader
You shouldn't delete checkpoint files, but you can delete old, rotated log files after you've determined they're not needed anymore

相关问题