apache Google数据融合管道失败

nzk0hqpo  于 2023-04-07  发布在  Apache
关注(0)|答案(1)|浏览(178)

我试图建立谷歌数据融合管道从MS SQL Server加载数据到大查询.源(MS SQL Server - 2016标准)是在GCP VM上运行.我可以连接到SQL示例使用公共IP没有任何问题.我使用大查询连接器作为汇.
作为JDBC驱动程序,我使用Microsoft SQL Server JDBC Driver v 6.0.jre7,它可以在数据融合HUB上使用。

以下是数据融合示例的详细信息:

  • 版本:BASIC
  • 版本:6.8.1(最新版本)
  • 专用IP已禁用

当我运行管道时,它失败并显示以下错误消息。任何人都可以帮助我解决这个问题。我在配置中缺少什么。

04/07/2023 0:42:40
INFO
Launch main class org.apache.spark.executor.YarnCoarseGrainedExecutorBackend.main([--driver-url, spark://CoarseGrainedScheduler@cdap-mypipe-98803685-d4b2-11ed-a81e-26b0641a83bd-w-1.us-central1-f.c.marine-fusion-270120.internal:35779, --executor-id, 1, --hostname, cdap-mypipe-98803685-d4b2-11ed-a81e-bd-w-0.us-central1-f.c.marine-fusion-2.internal, --cores, 1, --app-id, application_1680809963358_0002, --resourceProfileId, 0, --user-class-path, file:/hadoop/yarn/nm-local-dir/usercache/yarn/appcache/application_1680809963358_0002/container_1680809963358_0002_01_000002/__app__.jar])
04/07/2023 0:42:46
WARN
Cannot load filesystem: java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.hdfs.web.HftpFileSystem not found
04/07/2023 0:42:46
WARN
Cannot load filesystem: java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.hdfs.web.HsftpFileSystem not found
04/07/2023 0:42:48
ERROR
Aborting task
04/07/2023 0:42:48
ERROR
Task attempt_202304061942308162483986510619117_0003_r_000000_0 aborted.
04/07/2023 0:42:48
ERROR
Exception in task 0.0 in stage 0.0 (TID 0)
04/07/2023 0:42:48
WARN
Lost task 0.0 in stage 0.0 (TID 0) (cdap-mypipe-98803685-d4b2-11ed-a81e-26b0641a83bd-w-1.us-central1-f.c.marine-fusion-270120.internal executor 2): org.apache.spark.SparkException: Task failed while writing rows at org.apache.spark.internal.io.SparkHadoopWriter$.executeTask(SparkHadoopWriter.scala:162) at org.apache.spark.internal.io.SparkHadoopWriter$.$anonfun$write$1(SparkHadoopWriter.scala:88) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:505) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:508) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: org.apache.avro.file.DataFileWriter$AppendWriteException: org.apache.avro.UnresolvedUnionException: Not in union ["long","null"]: 1 at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:308) at io.cdap.plugin.gcp.bigquery.sink.AvroRecordWriter.write(AvroRecordWriter.java:90) at io.cdap.plugin.gcp.bigquery.sink.AvroRecordWriter.write(AvroRecordWriter.java:37) at io.cdap.plugin.gcp.bigquery.sink.BigQueryRecordWriter.write(BigQueryRecordWriter.java:58) at io.cdap.plugin.gcp.bigquery.sink.BigQueryRecordWriter.write(BigQueryRecordWriter.java:32) at io.cdap.cdap.etl.spark.io.TrackingRecordWriter.write(TrackingRecordWriter.java:41) at org.apache.spark.internal.io.HadoopMapReduceWriteConfigUtil.write(SparkHadoopWriter.scala:367) at org.apache.spark.internal.io.SparkHadoopWriter$.$anonfun$executeTask$1(SparkHadoopWriter.scala:137) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1473) at org.apache.spark.internal.io.SparkHadoopWriter$.executeTask(SparkHadoopWriter.scala:134) ... 9 more Caused by: org.apache.avro.UnresolvedUnionException: Not in union ["long","null"]: 1 at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:740) at org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:205) at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:123) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75) at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:159) at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166) at org.apache.avro.specific.SpecificDatumWriter.writeField(SpecificDatumWriter.java:90) at org.apache.avro.reflect.ReflectDatumWriter.writeField(ReflectDatumWriter.java:191) at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156) at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75) at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:159) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62) at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:302) ... 18 more
04/07/2023 0:42:49
ERROR
Aborting task
04/07/2023 0:42:49
ERROR
Task attempt_202304061942308162483986510619117_0003_r_000000_1 aborted.
04/07/2023 0:42:49
ERROR
Exception in task 0.1 in stage 0.0 (TID 1)
04/07/2023 0:42:51
WARN
Cannot load filesystem: java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.hdfs.web.HftpFileSystem not found
04/07/2023 0:42:51
WARN
Cannot load filesystem: java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.hdfs.web.HsftpFileSystem not found
04/07/2023 0:42:53
ERROR
Aborting task
yqlxgs2m

yqlxgs2m1#

警告和错误提示某些文件系统无法读取或写入。如果我没记错的话,google cloud中的管道需要一些中间云存储来传输数据。您是否按照本文档的步骤8c指定了云存储?

相关问题