在这个Dataframe中,我有以下两个数组:折扣应用程序和行项目。行\u items数组有一个名为discount\u allocaitons的内部数组,该数组有一个名为discount\u application\u index的字段。任务是使用折扣应用程序索引值,在折扣应用程序数组索引中找到相应的“类型”值,并将其复制到相应的应用程序类型字段中。
以下是Dataframe:
records = '[{"_c":{"discount_applications":[{"type":"manual0"},{"type":"manual1"},{"type":"manual2"},{"type":"manual3"}],"line_items":[{"discount_allocations":[{"application_type":"","discount_application_index":0}]},{"discount_allocations":[{"application_type":"","discount_application_index":1}]},{"discount_allocations":[{"application_type":"","discount_application_index":2}]},{"discount_allocations":[{"application_type":"","discount_application_index":3}]}]}},{"_c":{"discount_applications":[{"type":"manual0"},{"type":"manual1"},{"type":"manual2"}],"line_items":[{"discount_allocations":[{"application_type":"","discount_application_index":0}]},{"discount_allocations":[{"application_type":"","discount_application_index":1}]},{"discount_allocations":[{"application_type":"","discount_application_index":2}]}]}},{"_c":{"discount_applications":[{"type":"manual0"},{"type":"manual1"},{"type":"manual2"}],"line_items":[{"discount_allocations":[{"application_type":"","discount_application_index":0}]},{"discount_allocations":[{"application_type":"","discount_application_index":1}]},{"discount_allocations":[{"application_type":"","discount_application_index":2}]}]}}]'
df = spark.read.json(sc.parallelize([records]))
df.show(truncate=False)
df.printSchema()
root
|-- _c: struct (nullable = true)
| |-- discount_applications: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- type: string (nullable = true)
| |-- line_items: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- discount_allocations: array (nullable = true)
| | | | |-- element: struct (containsNull = true)
| | | | | |-- application_type: string (nullable = true)
| | | | | |-- discount_application_index: long (nullable = true)
+--------------------------------------------------------------------------------------------+
|_c |
+--------------------------------------------------------------------------------------------+
|[[[manual0], [manual1], [manual2], [manual3]], [[[[, 0]]], [[[, 1]]], [[[, 2]]], [[[, 3]]]]]|
|[[[manual0], [manual1], [manual2]], [[[[, 0]]], [[[, 1]]], [[[, 2]]]]] |
|[[[manual0], [manual1], [manual2]], [[[[, 0]]], [[[, 1]]], [[[, 2]]]]] |
+--------------------------------------------------------------------------------------------+
转换之后,问题是让Dataframe看起来像这样:
+------------------------------------------------------------------------------------------------------------------------+
|_c |
+------------------------------------------------------------------------------------------------------------------------+
|[[[manual0], [manual1], [manual2], [manual3]], [[[[manual0, 0]]], [[[manual1, 1]]], [[[manual2, 2]]], [[[manual3, 3]]]]]|
|[[[manual0], [manual1], [manual2]], [[[[manual0, 0]]], [[[manual1, 1]]], [[[manual2, 2]]]]] |
|[[[manual0], [manual1], [manual2]], [[[[manual0, 0]]], [[[manual1, 1]]], [[[manual2, 2]]]]] |
+------------------------------------------------------------------------------------------------------------------------+
1条答案
按热度按时间3vpjnl9f1#
把你的脑袋弄清楚然后做
transform
:)