excel—rdd字符将被转换为Dataframe

4xrmg8kj  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(481)

rdd数据将被转换成Dataframe。但我不能这么做。todf不起作用,我也尝试使用array rdd到dataframe。请告诉我。这个程序是用于解析使用scala和spark的示例excel

import java.io.{File, FileInputStream}
import org.apache.poi.xssf.usermodel.XSSFCell
import org.apache.poi.xssf.usermodel.{XSSFSheet, XSSFWorkbook}
import org.apache.poi.ss.usermodel.Cell._
import org.apache.spark.sql.SQLContext
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.types.{ StructType, StructField, StringType, IntegerType };
object excel 
  {
  def main(args: Array[String]) = 
  {
  val sc = new SparkContext(new SparkConf().setAppName("Excel Parsing").setMaster("local[*]"))
  val file = new FileInputStream(new File("test.xlsx"))
  val wb = new XSSFWorkbook(file)
  val sheet = wb.getSheetAt(0)
  val rowIterator = sheet.iterator()
  val builder = StringBuilder.newBuilder
  var column = ""
  while (rowIterator.hasNext()) 
  {
  val row = rowIterator.next();
  val cellIterator = row.cellIterator();
  while (cellIterator.hasNext()) 
  {
  val cell = cellIterator.next();
  cell.getCellType match {
  case CELL_TYPE_NUMERIC ⇒builder.append(cell.getNumericCellValue + ",")
  case CELL_TYPE_BOOLEAN ⇒ builder.append(cell.getBooleanCellValue + ",")
  case CELL_TYPE_STRING ⇒ builder.append(cell.getStringCellValue + ",")
  case CELL_TYPE_BLANK ⇒ builder.append(",")
  }
  }
  column = builder.toString()
  println(column)
  builder.setLength(0)
  }
  val data= sc.parallelize(column)
  println(data)
  }
  }
yrdbyhpb

yrdbyhpb1#

用于转换 Spark RDDDataFrame . 你得做个决定 sqlContext 或者 sparkSession 根据Spark版本再使用

val sqlContext=new SQLContext(sc)
    import sqlContext.implicits._

如果您使用的是spark 2.0或更高版本,请改用sparksession,因为新版本中不推荐使用sqlcontext!

val spark=SparkSession.builder.config(conf).getOrCreate.
import spark.implicits._

这将允许您在rdd上使用todf。这也许能解决你的问题!
注意:要使用sqlcontext,必须包含spark\usql作为依赖项!

相关问题