eclipse—如何在scala spark中迭代json对象

yvgpqqbh  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(392)

我有一个输入json文件,它有两个对象。当我尝试读取文件时,我使用schema获得第一个对象值。
这是我的密码
//示例json

{
name: jack,
age: 30,
joinDate: 12-12-2018,
id: 01123
}
{
name: bob,
age: 25,
joinDate: 12-01-2019,
id: 02354
}

object readjson {
val Schema = StructType(Seq(
      StructField("name", StringType),
      StructField("age", StringType),
      StructField("joinDate", StringType),
      StructField("id", StringType)
    ));

    val json_file_path = "C:\\employee"

    val dataframe = spark
      .read
      .option("multiLine", true)
      .schema(Schema)
      .json(json_file_path)
      .show()
}

我得到的输出:

name age joinDate id
jack 30  12-12-2018 01123

预期产量:

name age joinDate id
jack 30  12-12-2018 01123
bob  25  12-01-2019 02354
8yparm6h

8yparm6h1#

我使用spark 2.4.4尝试了您的代码,它工作正常,我所做的唯一更改是用双引号将json中的字符串括起来:

[
  {
    "name": "jack",
    "age": 30,
    "joinDate": "12-12-2018",
    "id": 1123
  },
  {
    "name": "bob",
    "age": 25,
    "joinDate": "12-01-2019",
    "id": 2354
  }
]
$ spark-shell
Spark context available as 'sc' (master = local[*], app id = local-1598014153867).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.4
      /_/

Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_201)
Type in expressions to have them evaluated.
Type :help for more information.

scala> import org.apache.spark.sql.types.{ StructType, StructField, StringType }
import org.apache.spark.sql.types.{StructType, StructField, StringType}

scala> object readjson {
     | val Schema = StructType(Seq(
     |       StructField("name", StringType),
     |       StructField("age", StringType),
     |       StructField("joinDate", StringType),
     |       StructField("id", StringType)
     |     ));
     | 
     |     val json_file_path = "<path-to-json-file>"
     | 
     |     val dataframe = spark
     |       .read
     |       .option("multiLine", true)
     |       .schema(Schema)
     |       .json(json_file_path)
     |       .show()
     | }
defined object readjson

scala> readjson
+----+---+----------+----+
|name|age|  joinDate|  id|
+----+---+----------+----+
|jack| 30|12-12-2018|1123|
| bob| 25|12-01-2019|2354|
+----+---+----------+----+

res0: readjson.type = readjson$@4759b196

相关问题