无法理解分布式模式下的kafka connect

cfh9epnr  于 2021-06-05  发布在  Kafka
关注(0)|答案(1)|浏览(438)

我开始Kafka连接在独立模式如下

/usr/local/confluent/bin/connect-standalone /usr/local/confluent/etc/kafka/connect-standalone.properties /usr/local/confluent/etc/kafka-connect-elasticsearch/quickstart-elasticsearch.properties

之后,我使用restapi创建了一个包含所有细节的连接器。这样地

curl  -X POST -H "Content-Type: application/json" --data '{"name":"elastic-search-sink-audit","config":{"connector.class":"io.confluent.connect.elasticsearch.ElasticsearchSinkConnector","tasks.max":"5","topics":"fsp-AUDIT_EVENT_DEMO","key.ignore":"true","connection.url":"https://**.amazonaws.com","type.name":"kafka-connect-distributed","name":"elastic-search-sink-audit","errors.tolerance":"all","errors.deadletterqueue.topic.name":"fsp-dlq-audit-event"}}' http://localhost:8083/connectors | jq

之后,当我检查状态,我可以看到5个任务正在运行

curl  localhost:8083/connectors/elastic-search-sink-audit/tasks | jq

问题1:
这是否意味着我运行我的Kafka连接连接器在分布式模式或独立模式只?
问题2:
我是否必须修改connect-distributed.properties文件并像单机版一样启动?
问题3:
目前我只在一个ec2中运行我的所有设置,现在如果我必须再添加5个ec2以使连接器更并行并加快速度,我如何才能做到这一点,kafka connect将如何理解又添加了5个ec2并且它必须共享工作负载?
问题4:我是否必须在所有ec2中运行、启动和创建kafka连接,然后才开始?如何确认所有5个ec2都使用相同的连接器正常运行。
最后我给出了在分布式模式下尝试启动连接器。首先我是这样开始的

/usr/local/confluent/bin/connect-distributed /usr/local/confluent/etc/kafka/connect-distributed.properties /usr/local/confluent/etc/kafka-connect-elasticsearch/quickstart-elasticsearch.properties

然后在另一个使用restapi的会话中,我像这样提交了

curl  -X POST -H "Content-Type: application/json" --data '{"name":"elastic-search-sink-audit","config":{"connector.class":"io.confluent.connect.elasticsearch.ElasticsearchSinkConnector","tasks.max":"5","topics":"fsp-AUDIT_EVENT_DEMO","key.ignore":"true","connection.url":"https://**.amazonaws.com","type.name":"kafka-connect-distributed","name":"elastic-search-sink-audit","errors.tolerance":"all","errors.deadletterqueue.topic.name":"fsp-dlq-audit-event"}}' http://localhost:8083/connectors | jq

但一碰到这个我就开始犯这样的错误

rror: NOT_ENOUGH_REPLICAS (org.apache.kafka.clients.producer.internals.Sender:598)
[2020-02-01 13:48:15,551] WARN [Producer clientId=producer-3] Got error produce response with correlation id 159 on topic-partition connect-configs-0, retrying (2147483496 attempts left). Error: NOT_ENOUGH_REPLICAS (org.apache.kafka.clients.producer.internals.Sender:598)
[2020-02-01 13:48:15,652] WARN [Producer clientId=producer-3] Got error produce response with correlation id 160 on topic-partition connect-configs-0, retrying (2147483495 attempts left). Error: NOT_ENOUGH_REPLICAS (org.apache.kafka.clients.producer.internals.Sender:598)
[2020-02-01 13:48:15,753] WARN [Producer clientId=producer-3] Got error produce response with correlation id 161 on topic-partition connect-configs-0, retrying (2147483494 attempts left). Error: NOT_ENOUGH_REPLICAS (org.apache.kafka.clients.producer.internals.Sender:598)
[2020-02-01 13:48:15,854] WARN [Producer clientId=producer-3] Got error produce response with correlation id 162 on topic-partition connect-configs-0, retrying (2147483493 attempts left). Error: NOT_ENOUGH_REPLICAS (org.apache.kafka.clients.producer.internals.Sender:598)
[2020-02-01 13:48:15,956] WARN [Producer clientId=producer-3] Got error produce response with correlation id 163 on topic-partition connect-configs-0, retrying (2147483492 attempts left). Error: NOT_ENOUGH_REPLICAS (org.apache.kafka.clients.producer.internals.Sender:598)

最后,当我尝试使用curl创建连接器时请求超时

{ "error_code": 500, "message": "Request timed out" }

请帮我理解这个。

bpsygsoo

bpsygsoo1#

两种模式都启动restapi
分布式模式不接受连接器的属性文件,必须将其发布。没有理由在单机版中这样做,因为您从命令行提供的连接器已经在运行
建议使用分布式模式,因为连接器的状态存储回kafka主题,而不是在运行独立模式的单机上的文件中维护
有关更多详细信息,请参阅-Kafka连接概念
Kafka将如何理解5更多的ec2已被添加,它必须分担工作量?
我是否必须在所有ec2中运行、启动和创建kafka连接,然后才开始?如何确认所有5个ec2都使用相同的连接器正常运行。
好吧,您的ec2机器不知道启动任何进程,除非它们是某个分布式集群的一部分,所以您必须在每个集群上启动分布式模式,使用相同的设置(confluent的ansible repo使这非常容易)。
您可以点击任何连接服务器的/status端点,查看哪些地址正在运行哪些任务
副本不足
因为您没有足够的代理来创建用于跟踪状态的内部kafka connect主题

相关问题