另存为Parquet时出错{无法读取页脚}

vx6bjr1n  于 2021-05-29  发布在  Hadoop
关注(0)|答案(0)|浏览(533)

我正在写一份spark申请表。处理完日志后,我将输出保存为parquet格式(使用dataframe.saveAspQuetFile()api),但我在保存为Parquet地板时有时会出错。如果我重新运行保存为parquet的过程,错误就会消失。
请让我知道这件事的根本原因

java.io.IOException: Could not read footer: java.io.IOException: Could not read footer for file org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus@6233d82f
        at parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:238)
        at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.refresh(newParquet.scala:369)
        at org.apache.spark.sql.parquet.ParquetRelation2.org$apache$spark$sql$parquet$ParquetRelation2$$metadataCache$lzycompute(newParquet.scala:154)
        at org.apache.spark.sql.parquet.ParquetRelation2.org$apache$spark$sql$parquet$ParquetRelation2$$metadataCache(newParquet.scala:152)
        at org.apache.spark.sql.parquet.ParquetRelation2.refresh(newParquet.scala:197)
        at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.insert(commands.scala:134)
        at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.run(commands.scala:114)
        at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57)
        at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57)
        at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:68)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:87)
        at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:939)
        at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:939)
        at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:332)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:144)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:135)
        at org.apache.spark.sql.DataFrame.saveAsParquetFile(DataFrame.scala:1494)
        at ParquetWriter$.main(ParquetWriter.scala:182)
        at ParquetWriter.main(ParquetWriter.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.IOException: Could not read footer for file org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus@6233d82f
        at parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:230)
        at parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:224)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.fs.ChecksumException: Checksum error: file:/{directory}/output.parquet2/part-r-01418.gz.parquet at 17920
        at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.readChunk(ChecksumFileSystem.java:248)
        at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:273)
        at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:211)
        at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:229)
        at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:193)
        at org.apache.hadoop.fs.FSInputChecker.readFully(FSInputChecker.java:431)
        at org.apache.hadoop.fs.FSInputChecker.seek(FSInputChecker.java:412)
        at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:48)
        at org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream.seek(ChecksumFileSystem.java:318)
        at parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:413)
        at parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:228)
        ... 5 more

谢谢

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题