有没有办法创建spark中所有列的列表

dbf7pr2w 于 2021-05-29 发布在 Spark

关注(0)|答案(1)|浏览(490)

我有一个数据集：在这里输入图像描述
我需要得到每个第一列值的所有值的列表（在本例中只有两个，1和2）。
我试图按第一列进行分组，并希望按所有列进行聚合。这是我使用以下代码按第二列进行聚合时得到的结果：

df.groupBy("_c0").agg(collect_list("_c1")).show():

在此处输入图像描述
其思想是为每个第一列值获取一个列表中其他列中的所有值。

sql scala apache-spark

来源：https://stackoverflow.com/questions/62218092/is-there-a-way-to-create-a-list-of-all-columns-in-spark

1条答案

按热度按时间

ego6inou1#

试试这个：

scala> val df = List((1,2,3,4),(1,5,6,7),(2,8,9,10),(2,11,12,13)).toDF
df: org.apache.spark.sql.DataFrame = [_1: int, _2: int ... 2 more fields]

scala> df.show
+---+---+---+---+
| _1| _2| _3| _4|
+---+---+---+---+
|  1|  2|  3|  4|
|  1|  5|  6|  7|
|  2|  8|  9| 10|
|  2| 11| 12| 13|
+---+---+---+---+

scala> val firstColName = "_1"
firstColName: String = _1

scala> df.groupBy(firstColName).agg(flatten(collect_list(array(df.columns.filterNot(c=>c.equals(firstColName)).map(c=>col(c)):_*))).as("otherCols")).show(false)
+---+----------------------+
|_1 |otherCols             |
+---+----------------------+
|1  |[2, 3, 4, 5, 6, 7]    |
|2  |[8, 9, 10, 11, 12, 13]|
+---+----------------------+

赞(0）回复(0）举报 2021-05-29

我来回答

有没有办法创建spark中所有列的列表

1条答案

相关问题

热门标签

最新问答