我们试图使用nfs provisioner在kubernetes上运行kafka集群。集群出现得很好。然而,当我们杀死一个Kafka吊舱时,替换吊舱却没有出现。
pod删除前的持久卷:
# mount
10.102.32.184:/export/pvc-ce1461b3-1b38-11e8-a88e-005056073f99 on /opt/kafka/data type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.133.40.245,local_lock=none,addr=10.102.32.184)
# ls -al /opt/kafka/data/logs
total 4
drwxr-sr-x 2 99 99 152 Feb 26 21:07 .
drwxrwsrwx 3 99 99 18 Feb 26 21:07 ..
-rw-r--r-- 1 99 99 0 Feb 26 21:07 .lock
-rw-r--r-- 1 99 99 0 Feb 26 21:07 cleaner-offset-checkpoint
-rw-r--r-- 1 99 99 57 Feb 26 21:07 meta.properties
-rw-r--r-- 1 99 99 0 Feb 26 21:07 recovery-point-offset-checkpoint
-rw-r--r-- 1 99 99 0 Feb 26 21:07 replication-offset-checkpoint
# cat /opt/kafka/data/logs /meta.properties
#
# Mon Feb 26 21:07:08 UTC 2018
version=0
broker.id=1003
删除pod:
kubectl delete pod kafka-iced-unicorn-1
新创建的pod中重新连接的持久卷:
# ls -al /opt/kafka/data/logs
total 4
drwxr-sr-x 2 99 99 180 Feb 26 21:10 .
drwxrwsrwx 3 99 99 18 Feb 26 21:07 ..
-rw-r--r-- 1 99 99 0 Feb 26 21:10 .kafka_cleanshutdown
-rw-r--r-- 1 99 99 0 Feb 26 21:07 .lock
-rw-r--r-- 1 99 99 0 Feb 26 21:07 cleaner-offset-checkpoint
-rw-r--r-- 1 99 99 57 Feb 26 21:07 meta.properties
-rw-r--r-- 1 99 99 0 Feb 26 21:07 recovery-point-offset-checkpoint
-rw-r--r-- 1 99 99 0 Feb 26 21:07 replication-offset-checkpoint
# cat /opt/kafka/data/logs/meta.properties
#
# Mon Feb 26 21:07:08 UTC 2018
version=0
broker.id=1003
我们在Kafka日志中看到以下错误:
[2018-02-26 21:26:40,606] INFO [ThrottledRequestReaper-Produce], Starting (kafka.server.ClientQuotaManager$ThrottledRequestReaper)
[2018-02-26 21:26:40,711] FATAL [Kafka Server 1002], Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
java.io.IOException: Invalid argument
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:1012)
at kafka.utils.FileLock.<init>(FileLock.scala:28)
at kafka.log.LogManager$$anonfun$lockLogDirs$1.apply(LogManager.scala:104)
at kafka.log.LogManager$$anonfun$lockLogDirs$1.apply(LogManager.scala:103)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at kafka.log.LogManager.lockLogDirs(LogManager.scala:103)
at kafka.log.LogManager.<init>(LogManager.scala:65)
at kafka.server.KafkaServer.createLogManager(KafkaServer.scala:648)
at kafka.server.KafkaServer.startup(KafkaServer.scala:208)
at io.confluent.support.metrics.SupportedServerStartable.startup(SupportedServerStartable.java:102)
at io.confluent.support.metrics.SupportedKafka.main(SupportedKafka.java:49)
[2018-02-26 21:26:40,713] INFO [Kafka Server 1002], shutting down (kafka.server.KafkaServer)
[2018-02-26 21:26:40,715] INFO Terminate ZkClient event thread. (org.I0Itec.zkclient.ZkEventThread)
解决这个问题的唯一方法似乎是删除持久卷声明并再次强制删除pod。或者使用nfs以外的另一个存储提供程序(rook在这种情况下运行良好)。
是否有人在nfs provisioner中遇到此问题?
暂无答案!
目前还没有任何答案,快来回答吧!