将dataframe转换为csv抛出错误pyspark

yx2lnoni  于 2021-05-27  发布在  Spark
关注(0)|答案(0)|浏览(284)

我有大约7gb记录的巨大Dataframe。我试图得到的Dataframe计数,并下载它作为csv两个结果在下面的错误。有没有其他方法下载Dataframe而不需要多个分区

print(df.count())
df.coalesce(1).write.option("header", "true").csv('/user/ABC/Output.csv')

Error:
java.io.IOException: Stream is corrupted
    at net.jpountz.lz4.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:202)
    at net.jpountz.lz4.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:228)
    at net.jpountz.lz4.LZ4BlockInputStream.read(LZ4BlockInputStream.java:157)
    at org.apache.spark.io.ReadAheadInputStream$1.run(ReadAheadInputStream.java:168)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
20/05/26 18:15:44 ERROR scheduler.TaskSetManager: Task 8 in stage 360.0 failed 1 times; aborting job
[Stage 360:=======>                                                (8 + 1) / 60]
Py4JJavaError: An error occurred while calling o18867.count.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 8 in stage 360.0 failed 1 times, most recent failure: Lost task 8.0 in stage 360.0 (TID 13986, localhost, executor driver): java.io.IOException: Stream is corrupted
    at net.jpountz.lz4.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:202)
    at net.jpountz.lz4.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:228)
    at net.jpountz.lz4.LZ4BlockInputStream.read(LZ4BlockInputStream.java:157)
    at org.apache.spark.io.ReadAheadInputStream$1.run(ReadAheadInputStream.java:168)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题