sql在sparkDataframe中的合并

jhiyze9q 于 2021-05-29 发布在 Spark

关注(0)|答案(0)|浏览(322)

我有一个Dataframe，并写入s3桶的目标位置。同时创建一个Dataframe，提供来自aws s3 parquet文件和snowflake sql的输入。在我们使用coalesce的代码中，awss3创建Dataframe和使用coalesce没有问题，但是当使用snowflake sql输入时出现了sparkoutofmemoryerror问题。
雪花代码：

empsql = 'Select * From Employee'
df = spark.sql(empsql) ##Spark is configured
df.coalesce(2).write.mode('overwrite').format("parquet").option("delimiter",'|').save(s3_path, header = True)
result: Error SparkOutOfMemoryError and it worked with repartition

aws s3规范：

empsql = s3_path
aws_s3_df = spark.read.parquet(s3_location)
aws_s3_df.coalesce(2).write.mode('overwrite').format("parquet").option("delimiter",'|').save(s3_path, header = True)
result:No Error and working with coalesce

为什么coalesce使用awss3而不是snowflake sql？

DataFrame apache-spark pyspark parquet snowflake-cloud-data-platform

来源：https://stackoverflow.com/questions/62521861/aws-s3-vs-snowflake-sql-using-coalesce-in-spark-dataframe

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

sql在sparkDataframe中的合并

暂无答案！

相关问题

热门标签

最新问答