scala 是否有可能通过将所有字段集合在一起来执行一个组?

mdfafbf1  于 2023-04-12  发布在  Scala
关注(0)|答案(2)|浏览(136)

我在Apache Spark 3.3.2上。

val df: Dataset[Row] = ???

df
 .groupBy($"someKey")
 .agg(collect_set(???)) //I want to collect all the columns here including the key.

正如在评论中提到的,我想收集所有的列,而不必再次指定所有的列。有没有办法做到这一点?

watbbzwu

watbbzwu1#

你可以使用df.columns来访问你的dataframe的列列表。然后你可以处理它来生成你想要的聚合列表:

# let's say that you want to group by "someKey", and collect the values
# of all the other columns.
from pyspark.sql import functions as F
result = df\
    .groupBy("someKey")\
    .agg(*[F.collect_set(c).alias(c) for c in df.columns if c != "someKey"])

注意:如果您还想收集someKey列,则可以删除if c != "someKey"
在scala中,agg函数签名如下:
def agg(expr: Column, exprs: Column*): DataFrame
因此,我们不能直接解包列表,但有一个简单的解决方法:

val aggs = df.columns
    .map(c => collect_set(c) as c)
    .filter( _ != "someKey") // optionnaly
val result = df.groupBy("someKey").agg(aggs.head, aggs.tail : _* )
wgeznvg7

wgeznvg72#

如果你的目的是聚合所有匹配相同键的元素作为一个json对象列表,你可以执行如下操作:

import org.apache.spark.sql.functions._
import org.apache.spark.sql.DataFrame

val df = spark.sqlContext.createDataFrame(Seq(
      ("steak", "1990-01-01", "2022-03-30", 150),
      ("steak", "2000-01-02", "2021-01-13", 180),
      ("fish",  "1990-01-01", "2001-02-01", 100)
    )).toDF("key", "startDate", "endDate", "price")

df.show()

df
 .groupBy("key")
 .agg(collect_set(struct($"*")).as("value"))
 .show(false)

输出:

+-----+----------+----------+-----+
|  key| startDate|   endDate|price|
+-----+----------+----------+-----+
|steak|1990-01-01|2022-03-30|  150|
|steak|2000-01-02|2021-01-13|  180|
| fish|1990-01-01|2001-02-01|  100|
+-----+----------+----------+-----+

+-----+----------------------------------------------------------------------------+
|key  |value                                                                       |
+-----+----------------------------------------------------------------------------+
|steak|[{steak, 1990-01-01, 2022-03-30, 150}, {steak, 2000-01-02, 2021-01-13, 180}]|
|fish |[{fish, 1990-01-01, 2001-02-01, 100}]                                       |
+-----+----------------------------------------------------------------------------+

相关问题