update structs数组-spark

gxwragnw  于 2021-05-29  发布在  Spark
关注(0)|答案(1)|浏览(391)

我有下面的spark delta表结构,

+---+------------------------------------------------------+
|id |addresses                                             |
+---+------------------------------------------------------+
|1  |[{"Address":"ABC", "Street": "XXX"}, {"Address":"XYZ", "Street": "YYY"}]|
+---+------------------------------------------------------+

这里的addresses列是一个结构数组。
我需要将数组中的第一个地址从“street”属性值更新为“”,而不更改列表中的第二个元素。
因此,“”应更新为“”,而“xyz”应更新为“yyy”
你可以假设,我在结构中有很多属性,比如street,zipcode等等,所以我想让它们保持不变,只是从street属性更新address的值。
如何在spark、databricks或sql中实现这一点?
模式,

|-- id: string (nullable = true)
|-- addresses: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- Address: string (nullable = true)
 |    |    |    |    |-- Street: string (nullable = true)

干杯!

guicsvcw

guicsvcw1#

请检查下面的代码。

scala> vdf.show(false)
+---+--------------+
|id |addresses     |
+---+--------------+
|1  |[[ABC], [XYZ]]|
+---+--------------+

scala> vdf.printSchema
root
 |-- id: integer (nullable = false)
 |-- addresses: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- Address: string (nullable = true)
scala> val new_address = array(struct(lit("AAA").as("Address")))
scala> val except_first = array_except($"addresses",array($"addresses"(0)))
scala> val addresses = array_union(new_address,except_first).as("addresses")
scala> vdf.select($"id",addresses).select($"id",$"addresses",to_json($"addresses").as("json_addresses")).show(false)

+---+--------------+-------------------------------------+
|id |addresses     |json_addresses                       |
+---+--------------+-------------------------------------+
|1  |[[AAA], [XYZ]]|[{"Address":"AAA"},{"Address":"XYZ"}]|
+---+--------------+-------------------------------------+

更新

scala> vdf.withColumn("addresses",explode($"addresses")).groupBy($"id").agg(collect_list(struct($"addresses.Street".as("Address"),$"addresses.Street")).as("addresses")).withColumn("json_data",to_json($"addresses")).show(false)
+---+------------------------+-------------------------------------------------------------------+
|id |addresses               |json_data                                                          |
+---+------------------------+-------------------------------------------------------------------+
|1  |[[XXX, XXX], [YYY, YYY]]|[{"Address":"XXX","Street":"XXX"},{"Address":"YYY","Street":"YYY"}]|
+---+------------------------+-------------------------------------------------------------------+

相关问题