gobblin错误:无法转换field:derivedwatermarkcolumn for 值:“”表示记录:

x4shl7ld  于 2021-05-27  发布在  Hadoop
关注(0)|答案(1)|浏览(381)

我正在尝试将数据从mysql表摄取到hdfs。但它给了我以下的错误

IST ERROR [TaskExecutor-0] org.apache.gobblin.runtime.Task [demo_user_1582873318919_0] 504 - Processing record incurs an unexpected exception:

java.lang.RuntimeException: Unable to convert field:derivedwatermarkcolumn for value:"abc" for record: 
{"id":"1","name":"abc","password":"abc","derivedwatermarkcolumn":"abc"}
at org.apache.gobblin.converter.avro.JsonElementConversionFactory$RecordConverter.convertField(JsonElementConversionFactory.java:647)
    at org.apache.gobblin.converter.avro.JsonElementConversionFactory$JsonElementConverter.convert(JsonElementConversionFactory.java:280)
    at org.apache.gobblin.converter.avro.JsonIntermediateToAvroConverter.convertRecord(JsonIntermediateToAvroConverter.java:81)
    at org.apache.gobblin.converter.avro.JsonIntermediateToAvroConverter.convertRecord(JsonIntermediateToAvroConverter.java:50)
    at org.apache.gobblin.instrumented.converter.InstrumentedConverterDecorator.convertRecordImpl(InstrumentedConverterDecorator.java:74)
    at org.apache.gobblin.instrumented.converter.InstrumentedConverterBase.convertRecord(InstrumentedConverterBase.java:125)
    at org.apache.gobblin.instrumented.converter.InstrumentedConverterDecorator.convertRecord(InstrumentedConverterDecorator.java:68)
    at org.apache.gobblin.runtime.MultiConverter$MultiConverterIterator$ChainedConverterIterator.<init>(MultiConverter.java:174)
    at org.apache.gobblin.runtime.MultiConverter$MultiConverterIterator.<init>(MultiConverter.java:130)
    at org.apache.gobblin.runtime.MultiConverter$1.iterator(MultiConverter.java:95)
    at org.apache.gobblin.runtime.Task.runSynchronousModel(Task.java:499)
    at org.apache.gobblin.runtime.Task.run(Task.java:362)
    at org.apache.gobblin.runtime.TaskExecutor$TrackingTask.run(TaskExecutor.java:443)
    at org.apache.gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:39)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Failed to parse the date
    at org.apache.gobblin.converter.avro.JsonElementConversionFactory$DateConverter.convertField(JsonElementConversionFactory.java:450)
    at org.apache.gobblin.converter.avro.JsonElementConversionFactory$JsonElementConverter.convert(JsonElementConversionFactory.java:280)
    at org.apache.gobblin.converter.avro.JsonElementConversionFactory$RecordConverter.convertField(JsonElementConversionFactory.java:639)
    ... 22 more
IST ERROR [TaskExecutor-0] org.apache.gobblin.runtime.Task [demo_user_1582893709536_0] 567 - Task task_GobblinMySql_1582893709536_0 failed
java.lang.RuntimeException: java.lang.RuntimeException: Failed to parse the date
    at org.apache.gobblin.runtime.Task.runSynchronousModel(Task.java:505)
    at org.apache.gobblin.runtime.Task.run(Task.java:362)
    at org.apache.gobblin.runtime.TaskExecutor$TrackingTask.run(TaskExecutor.java:443)
    at org.apache.gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:39)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Failed to parse the date
    at org.apache.gobblin.converter.avro.JsonElementConversionFactory$DateConverter.convertField(JsonElementConversionFactory.java:450)
    at org.apache.gobblin.converter.avro.JsonElementConversionFactory$JsonElementConverter.convert(JsonElementConversionFactory.java:280)
    at org.apache.gobblin.converter.avro.JsonElementConversionFactory$RecordConverter.convertField(JsonElementConversionFactory.java:639)
    at org.apache.gobblin.converter.avro.JsonElementConversionFactory$JsonElementConverter.convert(JsonElementConversionFactory.java:280)
    at org.apache.gobblin.converter.avro.JsonIntermediateToAvroConverter.convertRecord(JsonIntermediateToAvroConverter.java:81)
    at org.apache.gobblin.converter.avro.JsonIntermediateToAvroConverter.convertRecord(JsonIntermediateToAvroConverter.java:50)
    at org.apache.gobblin.instrumented.converter.InstrumentedConverterDecorator.convertRecordImpl(InstrumentedConverterDecorator.java:74)
    at org.apache.gobblin.instrumented.converter.InstrumentedConverterBase.convertRecord(InstrumentedConverterBase.java:125)
    at org.apache.gobblin.instrumented.converter.InstrumentedConverterDecorator.convertRecord(InstrumentedConverterDecorator.java:68)
    at org.apache.gobblin.runtime.MultiConverter$MultiConverterIterator$ChainedConverterIterator.<init>(MultiConverter.java:174)
    at org.apache.gobblin.runtime.MultiConverter$MultiConverterIterator.<init>(MultiConverter.java:130)
    at org.apache.gobblin.runtime.MultiConverter$1.iterator(MultiConverter.java:95)
    at org.apache.gobblin.runtime.Task.runSynchronousModel(Task.java:499)
    ... 12 more

下面是记录模式

IST INFO  [JobScheduler-0] org.apache.gobblin.source.jdbc.JdbcExtractor [demo_user_1582893709536_0] 361 - Schema:[

{"columnName":"id","dataType":{"type":"int"},"isWaterMark":false,"primaryKey":1,"length":0,"precision":10,"scale":0,"isNullabl
e":false,"format":"","comment":"","isUnique":false},

{"columnName":"name","dataType":"type":"string"},"isWaterMark":false,"primaryKey":0,"length":0,"precision":0,"scale":0,"isNulla
ble":true,"format":"","comment":"","isUnique":false},

{"columnName":"password","dataType":{"type":"string"},"isWaterMark":false,"primaryKey":0,"length":0,"precision":0,"scale":0,"isNulla
ble":true,"format":"","comment":"","isUnique":false},

{"columnName":"derivedwatermarkcolumn","dataType":{"type":"timestamp"},"isWaterMark":true,"primaryKey":0,"length":0,"precision":0,"scale":0,"isNul
lable":false,"comment":"Default watermark column","isUnique":false}]

水印派生的水印列的数据类型是timestamp,但在记录中是字符串“”。
作业和属性文件如下所示。
mysql.pull文件


# Job properties

job.name=GobblinMySql
job.group=MySql
job.description=Data pull from MySql
job.lock.enabled=False

# Extract properties

extract.namespace=demo
extract.table.type=snapshot_only
extract.table.name=user
extract.delta.fields=name,password
extract.primary.key.fields=id

# Property to consider the extract as full dump

extract.is.full=true

# Source properties

source.querybased.schema=demo
source.entity=user
source.querybased.extract.type=snapshot

mysql.properties属性


# Source properties - source class to extract data from Mysql Source

source.class=org.apache.gobblin.source.extractor.extract.jdbc.MysqlSource

# Source properties

source.max.number.of.partitions=1
source.querybased.partition.interval=1
source.querybased.is.compression=false
source.querybased.watermark.type=timestamp

# Source connection properties

source.conn.driver=com.mysql.jdbc.Driver
source.conn.username=root
source.conn.password=root
source.conn.host=localhost
source.conn.port=3306
source.conn.timeout=1500

# Converter properties - Record from mysql source will be processed by the below series of converters

converter.classes=org.apache.gobblin.converter.avro.JsonIntermediateToAvroConverter

# date columns format

converter.avro.timestamp.format=YYYY-MM-DD HH:MM:SS
converter.avro.date.format=yyyy-MM-dd
converter.avro.time.format=HH:mm:ss

# Qualitychecker properties

qualitychecker.task.policies=org.apache.gobblin.policies.count.RowCountPolicy,org.apache.gobblin.policies.schema.SchemaCompatibilityPolicy
qualitychecker.task.policy.types=OPTIONAL,OPTIONAL

# Publisher properties

data.publisher.type=org.apache.gobblin.publisher.BaseDataPublisher

是什么导致配置文件中出现此错误?如果有人知道,请帮忙。

xbp102n0

xbp102n01#

水印列的名称似乎来自extract.delta.fields属性。在您的示例中,它被设置为“name,password”,因此名称被视为水印。尝试将其设置为“derivedwatermarkcolumn”。
我是如何发现的:我查看了mysqlsource类的代码,找到了提到水印的地方,然后使用intellij的inspector找出了数据的来源。您可以通过上下文菜单->分析->分析数据流来获取它。

相关问题