如何将SparkMap键转换为单独的列

dldeef67  于 2022-12-13  发布在  Apache
关注(0)|答案(1)|浏览(144)

我用的是Spark 2.3和Scala 2.11.8。
我有一个如下所示的数据框架,

--------------------------------------------------------
| ID  | Name | Desc_map                                |
--------------------------------------------------------
|  1  | abcd | "Company" -> "aa" , "Salary" -> "1" ....|
|  2  | efgh | "Company" -> "bb" , "Salary" -> "2" ....|
|  3  | ijkl | "Company" -> "cc" , "Salary" -> "3" ....|
|  4  | mnop | "Company" -> "dd" , "Salary" -> "4" ....|
--------------------------------------------------------

预期 Dataframe ,

----------------------------------------
| ID  | Name | Company | Salary | .... |                         
----------------------------------------
|  1  | abcd |   aa    |   1    | .... |
|  2  | efgh |   bb    |   2    | .... |
|  3  | ijkl |   cc    |   3    | .... |
|  4  | mnop |   dd    |   4    | .... |
----------------------------------------

任何帮助都是感激不尽的。

but5z9lq

but5z9lq1#

如果data是包含以下内容的数据集:

+---+----+----------------------------+
|ID |Name|Map                         |
+---+----+----------------------------+
|1  |abcd|{Company -> aa, Salary -> 1}|
|2  |efgh|{Company -> bb, Salary -> 2}|
|3  |ijkl|{Company -> cc, Salary -> 3}|
|4  |mnop|{Company -> aa, Salary -> 4}|
+---+----+----------------------------+

您可以通过以下方式获得所需的输出:

data = data.selectExpr(
  "ID",
  "Name",
  "Map.Company",
  "Map.Salary"
)

最终输出:

+---+----+-------+------+
|ID |Name|Company|Salary|
+---+----+-------+------+
|1  |abcd|aa     |1     |
|2  |efgh|bb     |2     |
|3  |ijkl|cc     |3     |
|4  |mnop|aa     |4     |
+---+----+-------+------+

祝你好运!

相关问题