avro模式中的时间戳在kafka connect jdbc中产生不兼容的值验证

x7rlezfr  于 2021-06-07  发布在  Kafka
关注(0)|答案(1)|浏览(444)

jdbc接收器连接器产生错误:

org.apache.kafka.connect.errors.DataException: Invalid Java object for schema type INT64: class java.util.Date for field: "some_timestamp_field"
at org.apache.kafka.connect.data.ConnectSchema.validateValue(ConnectSchema.java:242)
at org.apache.kafka.connect.data.Struct.put(Struct.java:216)
at org.apache.kafka.connect.transforms.Cast.applyWithSchema(Cast.java:151)
at org.apache.kafka.connect.transforms.Cast.apply(Cast.java:107)
at org.apache.kafka.connect.runtime.TransformationChain.apply(TransformationChain.java:38)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:480)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:301)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:205)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:173)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:170)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:214)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

源jdbc连接器(mysql)注册的avro架构:

{  
   "type":"record",
   "name":"ConnectDefault",
   "namespace":"io.confluent.connect.avro",
   "fields":[  
      ...
      {  
         "name":"some_timestamp_field",
         "type":{  
            "type":"long",
            "connect.version":1,
            "connect.name":"org.apache.kafka.connect.data.Timestamp",
            "logicalType":"timestamp-millis"
         }
      },
      ...
   ]
}

看起来异常是由以下代码块引起的:https://github.com/apache/kafka/blob/f0282498e7a312a977acb127557520def338d45c/connect/api/src/main/java/org/apache/kafka/connect/data/connectschema.java#l239
因此,在avro模式中,timestamp字段被注册为具有正确(timestamp)逻辑类型的int64。但是connect将模式类型读取为 INT64 并与值类型进行了比较 java.util.Date .
这是一个错误,还是有一个解决办法?可能是我遗漏了一些东西,因为这看起来像一个标准的连接模型。
提前谢谢。
更新
接收器连接器配置:

{
    "name": "sink",
    "config": {
        "connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
        "tasks.max": "1",
        "topics": "topic",
        "connection.url": "jdbc:postgresql://host:port/db",
        "connection.user": "user",
        "connection.password": "password",

        "key.converter": "io.confluent.connect.avro.AvroConverter",
        "key.converter.schema.registry.url": "http://host:port",
        "value.converter": "io.confluent.connect.avro.AvroConverter",
        "value.converter.schema.registry.url": "http://host:port",

        "auto.create": "true",
        "insert.mode": "upsert",
        "pk.mode": "record_value",
        "pk.fields": "id"
    }
}

kafka中的反序列化数据:

{
   "id":678148,
   "some_timestamp_field":1543806057000,
   ...
}
0mkxixxg

0mkxixxg1#

我们制定了一个计划 work around 为了这个问题。我们的目标是将id从bigint转换为string(text/varchar),并将记录保存在下游db中。
但由于一个问题(可能是https://issues.apache.org/jira/browse/kafka-5891),强制转换id字段不起作用。kafka试图验证casting链中的timestamp字段,但读取模式类型/名称时出错,导致类型不匹配(请参阅上面的记录体和错误日志)。
所以我们做了如下工作: extract only the id field as key -> execute cast transform on the key -> it works as key does not contain timestamp field .
以下是变通配置:

{
    "name": "sink",
    "config": {
        "connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
        "tasks.max": "1",
        "topics": "topic",
        "connection.url": "jdbc:postgresql://host:port/db",
        "connection.user": "user",
        "connection.password": "password",

        "key.converter": "io.confluent.connect.avro.AvroConverter",
        "key.converter.schema.registry.url": "http://host:port",
        "value.converter": "io.confluent.connect.avro.AvroConverter",
        "value.converter.schema.registry.url": "http://host:port",

        "transforms": "createKey,castKeyToString",
        "transforms.createKey.type": "org.apache.kafka.connect.transforms.ValueToKey",
        "transforms.createKey.fields": "id",

        "transforms.castKeyToString.type": "org.apache.kafka.connect.transforms.Cast$Key",
        "transforms.castKeyToString.spec": "id:string",

        "auto.create": "true",
        "insert.mode": "upsert",
        "pk.mode": "record_key",
        "pk.fields": "id"
    }
}

免责声明:这不是一个合适的解决方案,只是一个解决办法。应该修复强制转换中的错误。在我看来,强制转换应该只关注指定用于强制转换的字段,而不是消息中的其他字段。
祝您今天过得愉快。

相关问题