我是apachespark的新手,我想得到parquet输出文件的大小。
我的设想是
从csv读取文件并另存为文本文件
myRDD.saveAsTextFile("person.txt")
保存文件后(localhost:4040)显示输入字节15607801和输出字节13551724
但当我保存为Parquet文件时
myDF.saveAsParquetFile("person.perquet")
用户界面(localhost:4040)在stage选项卡上,只显示inputbytes 15607801,outputbytes中没有任何内容。
有人能帮我吗。提前谢谢
当我调用restapi时编辑它,并给出以下响应。
[ {
"status" : "COMPLETE",
"stageId" : 4,
"attemptId" : 0,
"numActiveTasks" : 0,
"numCompleteTasks" : 1,
"numFailedTasks" : 0,
"executorRunTime" : 10955,
"inputBytes" : 15607801,
"inputRecords" : 1440721,
**"outputBytes" : 0,**
**"outputRecords" : 0,**
"shuffleReadBytes" : 0,
"shuffleReadRecords" : 0,
"shuffleWriteBytes" : 0,
"shuffleWriteRecords" : 0,
"memoryBytesSpilled" : 0,
"diskBytesSpilled" : 0,
"name" : "saveAsParquetFile at ParquetExample.scala:82",
"details" : "org.apache.spark.sql.DataFrame.saveAsParquetFile(DataFrame.scala:1494)\ncom.spark.sql.ParquetExample$.main(ParquetExample.scala:82)\ncom.spark.sql.ParquetExample.main(ParquetExample.scala)",
"schedulingPool" : "default",
"accumulatorUpdates" : [ ]
}, {
"status" : "COMPLETE",
"stageId" : 3,
"attemptId" : 0,
"numActiveTasks" : 0,
"numCompleteTasks" : 1,
"numFailedTasks" : 0,
"executorRunTime" : 2091,
"inputBytes" : 15607801,
"inputRecords" : 1440721,
**"outputBytes" : 13551724,**
**"outputRecords" : 1200540,**
"shuffleReadBytes" : 0,
"shuffleReadRecords" : 0,
"shuffleWriteBytes" : 0,
"shuffleWriteRecords" : 0,
"memoryBytesSpilled" : 0,
"diskBytesSpilled" : 0,
"name" : "saveAsTextFile at ParquetExample.scala:77",
"details" : "org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1379)\ncom.spark.sql.ParquetExample$.main(ParquetExample.scala:77)\ncom.spark.sql.ParquetExample.main(ParquetExample.scala)",
"schedulingPool" : "default",
"accumulatorUpdates" : [ ]
} ]
暂无答案!
目前还没有任何答案,快来回答吧!