我有一个数据,看起来像下面提到的表格
+----+----+--------+-------+--------+----------------------+
|User|Shop|Location| Seller|Quantity| GroupBYClause|
+----+----+--------+-------+--------+----------------------+
| 1| ABC| Loc1|Seller1| 10| Shop, location|
| 1| ABC| Loc1|Seller2| 10| Shop, location|
| 2| ABC| Loc1|Seller1| 10|Shop, location, Seller|
| 2| ABC| Loc1|Seller2| 10|Shop, location, Seller|
| 3| BCD| Loc1|Seller1| 10| location|
| 3| BCD| Loc1|Seller2| 10| location|
| 3| CDE| Loc2|Seller3| 10| location|
+----+----+--------+-------+--------+----------------------+
预期的最终输出是与附加列相同的数据,即sum(数量),其数量之和基于用户提到的聚合
例如,用户1将groupbyclause称为“shop,location”,因此与卖家无关,用户1的总和(数量)是20
类似地,对于用户2,groupbyclause为“shop,location,seller”,因此每行的sum(数量)都是10
期望输出
+------+----+--------+-------+--------+----------------------+-------------+
|UserId|Shop|location| Seller|Quantity| GroupBYClause|Sum(Quantity)|
+------+----+--------+-------+--------+----------------------+-------------+
| 1| ABC| Loc1|Seller1| 10| Shop, location| 20|
| 1| ABC| Loc1|Seller2| 10| Shop, location| 20|
| 2| ABC| Loc1|Seller1| 10|Shop, location, Seller| 10|
| 2| ABC| Loc1|Seller2| 10|Shop, location, Seller| 10|
| 3| BCD| Loc1|Seller1| 10| location| 20|
| 3| BCD| Loc1|Seller2| 10| location| 20|
| 3| CDE| Loc2|Seller3| 10| location| 10|
+------+----+--------+-------+--------+----------------------+-------------+
我面临的挑战是在spark中将列值用作groupby子句
请帮忙
val df = spark.createDataFrame(Seq(
(1, "ABC","Loc1","Seller1", 10, "Shop, location"),
(1, "ABC","Loc1","Seller2", 10, "Shop, location"),
(2, "ABC","Loc1","Seller1", 10, "Shop, location, Seller"),
(2, "ABC","Loc1","Seller2", 10, "Shop, location, Seller"),
(3, "BCD","Loc1","Seller1", 10, "location"),
(3, "BCD","Loc1","Seller2", 10, "location"),
(3, "CDE","Loc2","Seller3", 10, "location")
)).toDF("UserId","Shop", "Location","Seller", "Quantity", "GroupBYClause")
3条答案
按热度按时间ohfgkhjo1#
试试这个-
加载提供的测试数据
求和
动态定义分区
编辑-1(基于评论)
6bc51xsx2#
可以获取所有不同的“groupbyclause”,并为每个生成窗口,并用“when(..)。否则” Package :
dldeef673#
立方体函数在这里很有用。看一看:
结果(我已经包括了所有列供您理解,如果您打算使用它,请确保将它们清理干净)