spark saveastextfile()创建“temporay->0”文件夹结构,而不是函数中提到的结构

mqkwyuun  于 2021-05-27  发布在  Spark
关注(0)|答案(2)|浏览(350)

使用scalaide将输出保存在一个文件中,代码如下

import org.apache.spark.sql.SparkSession

object RDDWithCSVFile {
  def main(args : Array[String]): Unit={
    val spark=SparkSession.builder()
    .appName("Creating RDD with CSV Files")
    .master("local")
    .getOrCreate()

    val rdd= spark.sparkContext.textFile("src/test/resources/datasets/CDH_Wellness.csv")

    val header=rdd.first()

    val csvwithoutheader= rdd.filter(!_.contains(header))

    val elements= rddwithoutheader.map(line => {
      val colarray = line.split(",")
      Array((colarray(0),colarray(4),colarray(5),colarray(10))).mkString(" ")

    })

      elements.saveAsTextFile("C:/Spark_Files/RDDWithCSVFile/New Folder") 
  }
}

但是不是创建输出文件->part-00000,而是创建下面的文件夹结构
c:\spark\u files\rddwithcsvfile\new folder\u temporary\0\u temporary\trunt\u 20200526184311\u 0006\u m\u000000\u 0
在这个目录下,只创建了-00000部分,但它是一个空文件。未创建成功文件。
有人能提供一些建议吗。

lokaqttq

lokaqttq1#

原因:java.io.ioexception:(null)命令字符串中的条目:null chmod 0644 c:\spark\u files\rddwithcsvfile\new folder\u temporary\0\u temporary\trunt\u 20200527112424\u 0006\u m\u000000\u 0\part-00000
控制台出错

k5hmc34c

k5hmc34c2#

Updated code:

    val rdd= spark.sparkContext.textFile("src/test/resources/datasets/CDH_Wellness.csv")

    val rddwithoutheader= rdd.filter(_ != header)

    val elements= rddwithoutheader.map(line => {
      val colarray = line.split(",")
      Array((colarray(0),colarray(4),colarray(5),colarray(10))).mkString(" ")
    })

      elements.saveAsTextFile("C:/Spark_Files/RDDWithCSVFile/Output")
  }
}
`

*************

Updated the output path -C:/Spark_Files/RDDWithCSVFile/Output , but still then its creating the earlier directory

C:\Spark_Files\RDDWithCSVFile\New Folder\_temporary
\0\_temporary\attempt_20200527112424_0006_m_000000_0

The RDD is not empty , the output file its creating is empty

相关问题