scala 无法获取嵌套的json值作为列

vohkndzv  于 2023-01-13  发布在  Scala
关注(0)|答案(1)|浏览(133)

我正在尝试为json创建模式,并将其视为 Dataframe 中的列
输入json

{"place":{"place_name":"NYC","lon":0,"lat":0,"place_id":1009}, "region":{"region_issues":[{"key":"health","issue_name":"Cancer"},{"key":"sports","issue_name":"swimming"}}}

代码

val schemaRsvp =  new StructType()
      .add("place",  StructType(Array(
      StructField("place_name", DataTypes.StringType),
      StructField("lon", DataTypes.IntegerType),
      StructField("lat", DataTypes.IntegerType),
      StructField("place_id", DataTypes.IntegerType))))

 val ip =  spark.read.schema(schemaRsvp).json("D:\\Data\\rsvp\\inputrsvp.json")
 ip.show()

它在单列place中显示所有字段,希望按列显示值

place_name,lon,lat,place_id
NYC,0,0,1009

有什么建议吗,如何解决这个问题?

42fyovps

42fyovps1#

可以使用."*"将结构类型转换为列

ip.select("place.*").show()

+----------+---+---+--------+
|place_name|lon|lat|place_id|
+----------+---+---+--------+
|       NYC|  0|  0|    1009|
+----------+---+---+--------+
    • 更新日期:**

有了新的列数组,你可以分解你的日期,然后做同样的". *"把结构类型转换成列:

ip.select(col("place"), explode(col("region.region_issues")).as("region_issues"))
  .select("place.*", "region_issues.*").show(false)

+---+---+--------+----------+----------+------+
|lat|lon|place_id|place_name|issue_name|key   |
+---+---+--------+----------+----------+------+
|0  |0  |1009    |NYC       |Cancer    |health|
|0  |0  |1009    |NYC       |swimming  |sports|
+---+---+--------+----------+----------+------+

相关问题