当我使用相同的密钥时，kafka jdbc connect不会将消息发布到一个分区

uujelgoq 于 2021-06-07 发布在 Kafka

关注(0)|答案(2)|浏览(325)

具有相同密钥的消息应该转到主题的同一分区，但是kafka jdbc源连接器正在将消息发布到不同的分区。
我创建了一个有5个分区的主题（学生主题）。
我使用以下脚本创建了一个student表：

create TABLE student (
  std_id INT AUTO_INCREMENT PRIMARY KEY,
  std_name VARCHAR(50),
  class_name VARCHAR(50),
  father_name VARCHAR(50),
  mother_name VARCHAR(50), 
  school VARCHAR(50)
);

我的jdbc源代码快速启动属性文件如下

query= select * from student
task.max=1
mode=incrementing
incrementing.column.name=std_id
topic.prefix=student-topic-in
numeric.mapping=best_fit
timestamp.delay.interval.ms =10
transforms=CreateKey,ExtractKey,ConvertDate,Replace,InsertPartition,InsertTopic
transforms.CreateKey.type=org.apache.kafka.connect.transforms.ValueToKey
transforms.CreateKey.fields=class_name
transforms.ExtractKey.type=org.apache.kafka.connect.transforms.ExtractField$Key
transforms.ExtractKey.field=class_name

当我在db表中插入同一个类的学生详细信息时，所有消息都发布到一个分区。

student-topic-in 3 "15" @ 35: {"std_id":145,"std_name":"pranavi311","class_name":"15","father_name":"abcd1","mother_name":"efgh1","school_name":"CSI","partition":null,"topic":"student-topic-in"}
student-topic-in 3 "15" @ 36: {"std_id":146,"std_name":"pranavi321","class_name":"15","father_name":"abcd2","mother_name":"efgh2","school_name":"CSI","partition":null,"topic":"student-topic-in"}
student-topic-in 3 "15" @ 37: {"std_id":147,"std_name":"pranavi331","class_name":"15","father_name":"abcd3","mother_name":"efgh3","school_name":"CSI","partition":null,"topic":"student-topic-in"}
student-topic-in 3 "15" @ 38: {"std_id":148,"std_name":"pranavi341","class_name":"15","father_name":"abcd4","mother_name":"efgh4","school_name":"CSI","partition":null,"topic":"student-topic-in"}
student-topic-in 3 "15" @ 39: {"std_id":149,"std_name":"pranavi351","class_name":"15","father_name":"abcd5","mother_name":"efgh5","school_name":"CSI","partition":null,"topic":"student-topic-in"}
student-topic-in 3 "15" @ 40: {"std_id":150,"std_name":"pranavi361","class_name":"15","father_name":"abcd6","mother_name":"efgh6","school_name":"CSI","partition":null,"topic":"student-topic-in"}

%在偏移量41处到达[3]中学生主题的结尾
但是，如果我插入不同的班级学生的详细信息，它仍然是发布到一个分区。

student-topic-in 3 "11" @ 41: {"std_id":151,"std_name":"pranavi311","class_name":"11","father_name":"abcd1","mother_name":"efgh1","school_name":"CSI","partition":null,"topic":"student-topic-in"}
student-topic-in 3 "12" @ 42: {"std_id":152,"std_name":"pranavi321","class_name":"12","father_name":"abcd2","mother_name":"efgh2","school_name":"CSI","partition":null,"topic":"student-topic-in"}
student-topic-in 3 "13" @ 43: {"std_id":153,"std_name":"pranavi331","class_name":"13","father_name":"abcd3","mother_name":"efgh3","school_name":"CSI","partition":null,"topic":"student-topic-in"}
student-topic-in 3 "14" @ 44: {"std_id":154,"std_name":"pranavi341","class_name":"14","father_name":"abcd4","mother_name":"efgh4","school_name":"CSI","partition":null,"topic":"student-topic-in"}
student-topic-in 3 "15" @ 45: {"std_id":155,"std_name":"pranavi351","class_name":"15","father_name":"abcd5","mother_name":"efgh5","school_name":"CSI","partition":null,"topic":"student-topic-in"}
student-topic-in 0 "16" @ 31: {"std_id":156,"std_name":"pranavi361","class_name":"16","father_name":"abcd6","mother_name":"efgh6","school_name":"CSI","partition":null,"topic":"student-topic-in"}

%在偏移量46处到达[3]中学生主题的结尾
我正在使用下面的命令打印详细信息。

kafkacat -b localhost:9092 -C -t student-topic-in -f '%t %p %k @ %o: %s\n'

我的期望是，每个类的学生消息应该发布到一个特定的分区（在jdbc连接器中，我将类的名称指定为键），但它不起作用。
我到底错过了什么？如何将每个班级的学生发布到一个特定的分区？

apache-kafka jdbc apache-kafka-connect confluent-platform

来源：https://stackoverflow.com/questions/54307428/when-i-use-same-key-kafka-jdbc-connect-not-publishing-the-messages-to-one-parti

2条答案

按热度按时间

ki0zmccv1#

我用字符串转换器解决了这个问题 key.converter=org.apache.kafka.connect.storage.StringConverter

赞(0）回复(0）举报 2021-06-07

zdwk9cvp2#

在你的情况下，一切正常。
如果您检查kafka connect源代码，您可以在 WorkerSourceTask::sendRecords 方法，在生产者发送之前，对每条记录应用转换，然后消息由生产者转换为字节数组 Converter ```
private boolean sendRecords() {
...
final SourceRecord record = transformationChain.apply(preTransformRecord);
final ProducerRecord<byte[], byte[]> producerRecord = convertTransformedRecord(record);
...
}

在您的情况下，转换是： `CreateKey,ExtractKey,ConvertDate,Replace,InsertPartition,InsertTopic` 转换器是 `org.apache.kafka.connect.json.JsonConverter` 转换器用一个模式将密钥Map到字节数组，即发送给kafka。

@Override
public byte[] fromConnectData(String topic, Schema schema, Object value) {
JsonNode jsonValue = enableSchemas ? convertToJsonWithEnvelope(schema, value) : convertToJsonWithoutEnvelope(schema, value);
try {
return serializer.serialize(topic, jsonValue);
} catch (SerializationException e) {
throw new DataException("Converting Kafka Connect data to byte[] failed due to serialization error: ", e);
}
}

您已禁用架构，因此对于键，以下调用结果为：
11 `serializer.serialize(topic,new TextNode("11"))` = [34,49,49,34]
12 `serializer.serialize(topic,new TextNode("12"))` = [34,49,50,34]
13 `serializer.serialize(topic,new TextNode("13"))` = [34,49,51,34]
14 `serializer.serialize(topic,new TextNode("14"))` = [34,49,52,34]
15 `serializer.serialize(topic,new TextNode("15"))` = [34,49,53,34]
16 `serializer.serialize(topic,new TextNode("16"))` = [34,49,54,34]
每条消息都由 `Producer` 到某个分区。消息将发送到哪个分区取决于 `Partitioner` ( `org.apache.kafka.clients.producer.Partitioner` ). Kafka连接使用默认连接- `org.apache.kafka.clients.producer.internals.DefaultPartitioner` 在引擎盖下 `DefaultPartitioner` 使用以下函数计算分区： `org.apache.kafka.common.utils.Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;` 如果应用于参数（5个分区和密钥的字节数组），将得到如下结果： `Utils.toPositive(Utils.murmur2(new byte[]{34,49,49,34})) % 5` = 3 `Utils.toPositive(Utils.murmur2(new byte[]{34,49,50,34})) % 5` = 3 `Utils.toPositive(Utils.murmur2(new byte[]{34,49,51,34})) % 5` = 3 `Utils.toPositive(Utils.murmur2(new byte[]{34,49,52,34})) % 5` = 3 `Utils.toPositive(Utils.murmur2(new byte[]{34,49,53,34})) % 5` = 3 `Utils.toPositive(Utils.murmur2(new byte[]{34,49,54,34})) % 5` = 0
希望，这或多或少能解释，什么和为什么

赞(0）回复(0）举报 2021-06-07

我来回答

当我使用相同的密钥时，kafka jdbc connect不会将消息发布到一个分区

2条答案

相关问题

热门标签

最新问答