spark在保存Parquet文件时出现多个数据源错误

vwkv1x7d  于 2021-05-27  发布在  Spark
关注(0)|答案(0)|浏览(669)

我试图学习spark和scala,在我试图通过调用parquet方法将结果的dataframe对象写入parquet文件时,我得到了这样的错误
代码库fails:-

df2.write.mode(SaveMode.Overwrite).parquet(outputPath)

这也失败了

df2.write.format("org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat").mode(SaveMode.Overwrite).parquet(outputPath)

错误log:-

Exception in thread "main" org.apache.spark.sql.AnalysisException: Multiple sources found for parquet (org.apache.spark.sql.execution.datasources.v2.parquet.ParquetDataSourceV2, org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat), please specify the fully qualified class name.;
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:707)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:733)
at org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:967)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:304)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:288)
at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:848)

如果我调用另一个方法来保存,代码会正常工作,
这很管用fine:-

df2.write.format("org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat").mode(SaveMode.Overwrite).save(outputPath)

虽然我对这个问题有一个解决方案,但我想了解为什么第一种方法不起作用,以及如何解决它。
我使用的规范的细节are:- scala 2.12.9 java 1.8 spark 2.4.4版本
p、 这个问题只在spark上看到

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题