使用分组集组合over(partition by…)?

lvmkulzt  于 2021-06-24  发布在  Hive
关注(0)|答案(1)|浏览(261)

我有以下几点 transaction 表格:

我想计算每种产品的总采购量:
product category (即同一产品内所有产品的总数量) category ) department (即同一产品内所有产品的总数量) department )
此外,应计算上述总额:
每位顾客
每个家庭(同一家庭中所有购物者的总数量之和)。
输出表应如下所示:

对于家庭来说,总数计算一次,然后“复制”给同一家庭的每个购物者。
计算 product / category / department 隔着table,我在用 GROUPING SETS 正如我在前面的问题中指出的。所以我得到了 total_quantity_individual 正确的。
为了 total_quantity_family ,使用 OVER(PARTITION BY) 正如这里在一张简单得多的table上指出的。
但是,我不知道如何将两者结合起来。关于组合的信息不多 OVER(PARTITION BY)GROUPING SETS .
我的查询如下所示:

SELECT
    family_id,
    shopper_id,
    CASE
        WHEN GROUPING__ID = 6 THEN 'department'
        WHEN GROUPING__ID = 5 THEN 'category'
        WHEN GROUPING__ID = 3 THEN 'product'
    END AS total_level_type,
    CASE
        WHEN GROUPING__ID = 6 THEN department
        WHEN GROUPING__ID = 5 THEN category
        WHEN GROUPING__ID = 3 THEN product
    END AS id,
    SUM(quantity) AS total_quantity_shopper
    -- sum(sum(quantity)) OVER (PARTITION BY family_id, product) AS total_quantity_family
FROM
    transaction
GROUP BY
    family_id, 
    shopper_id,
    product,
    category,
    department
    GROUPING SETS (
        (family_id, shopper_id, product),
        (family_id, shopper_id, category),
        (family_id, shopper_id, department)
    )
ORDER BY
  total_level_type;

如果 OVER(PARTITION BY) 不适用于我的情况,我的其他选择可能是:
transactionfamily_id ,然后对结果运行分组集,然后使用 transaction .
也许是使用explode()和侧视图的技巧?
出于可维护性的原因,我真的不想将单个版本和系列版本之间的查询分开。
注意:如果有帮助的话,我将sparksql与hive上下文一起使用。
谢谢你的帮助。谢谢!
编辑:这似乎有效:

...
SUM(quantity) AS total_quantity_shopper,
CASE        
    WHEN GROUPING__ID = 6 THEN sum(sum(quantity)) OVER (PARTITION BY family_id, department)
    WHEN GROUPING__ID = 5 THEN sum(sum(quantity)) OVER (PARTITION BY family_id, category)
    WHEN GROUPING__ID = 3 THEN sum(sum(quantity)) OVER (PARTITION BY family_id, product)
END AS total_quantity_family
...
3okqufwl

3okqufwl1#

使用多个 sum() over() 不同的 partition by 条款:

select
  family_id,
  shopper_id,
  total_level_type,
  id,
  total_quantity_individual,
  total_quantity_family
from
(
  select 
      family_id, 
      shopper_id,
      array(
        NAMED_STRUCT('id', product, 
                     'total_level_type', 'product',
                     'total_quantity_individual', sum(quantity) OVER (PARTITION BY family_id, shopper_id, product),
                     'total_quantity_family', sum(quantity) OVER (PARTITION BY family_id, product)
                     ),
        NAMED_STRUCT('id', category, 
                     'total_level_type', 'category',
                     'total_quantity_individual', sum(quantity) OVER (PARTITION BY family_id, shopper_id, category),
                     'total_quantity_family', sum(quantity) OVER (PARTITION BY family_id, category)
                     ),
        NAMED_STRUCT('id', department, 
                     'total_level_type', 'department', 
                     'total_quantity_individual', sum(quantity) OVER (PARTITION BY family_id, shopper_id, department),
                     'total_quantity_family', sum(quantity) OVER (PARTITION BY family_id, department)
                     )
      ) AS array_structs
  from
    transaction
)
lateral view inline(array_structs) exploded
order by
  total_level_type

相关问题