windows Array_Agg不支持Snowflake中的窗口框架-如何实现?

u1ehiz5o  于 2022-11-26  发布在  Windows
关注(0)|答案(4)|浏览(155)

我需要运行:

select arrayagg(o_clerk) 
  within group (order by o_orderkey desc) 
  OVER (PARTITION BY o_orderkey order by o_orderkey 
     ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS RESULT
from sample_data

但是Snowflake返回错误Sliding window frame unsupported for function ARRAYAGG。如果我试图在没有滑动窗口的情况下累加所有值,我会得到错误Cumulative window frame unsupported for function ARRAY_AGG
我如何才能做到这一点?
示例数据:

create or replace table sample_data as (
    with data as (
        select 1 a, [1,3,2,4,7,8,10] b
        union all select 2, [1,3,2,4,7,8,10]
    )

    select 'Ord'||a o_orderkey, 'c'||value o_clerk, index
    from data, table(flatten(b))
)
;

预期结果:

source,用于BigQuery迁移)

jjhzyzn0

jjhzyzn01#

我非常喜欢Emanuel在forums上的解决方案:

select o_orderkey, 
    array_compact([
        lag(o_clerk, 3) over(partition by o_orderkey order by index)
        , lag(o_clerk, 2) over(partition by o_orderkey order by index)
        , lag(o_clerk, 1) over(partition by o_orderkey order by index)
        , o_clerk
    ])
from sample_data

(如果Emanuel想在此添加自己的答案,我将删除此处的答案)

emeijp43

emeijp432#

谢谢@Felipe Hoffa这里是我的解决方案最初张贴在:
https://community.snowflake.com/s/question/0D73r000006upCECAY/detail?fromEmail=1&s1oid=00Di0000000hZh2&s1nid=0DB3100000001Fq&s1uid=0050Z000009Xcck&s1ext=0&emkind=chatterCommentNotification&emtm=1669252526852

select o_orderkey, 
    array_compact([
        lag(o_clerk, 3) over(partition by o_orderkey order by index)
        , lag(o_clerk, 2) over(partition by o_orderkey order by index)
        , lag(o_clerk, 1) over(partition by o_orderkey order by index)
        , o_clerk
    ])
from sample_data

mutmk8jj

mutmk8jj3#

由于我们不能用ARRAY_AGG()得到这些结果,我们可以通过一个self join。
首先我们给予每一行一个行号,然后选择行号和+3之间的所有行:

with numbered as (
    select o_orderkey, o_clerk, index
    from sample_data
), crossed as (
    select a.o_orderkey, a.index ai, b.index bi, b.o_clerk
    from numbered a
    join numbered b
    on a.o_orderkey = b.o_orderkey
    and a.index between b.index and b.index+3
)

select o_orderkey, array_agg(o_clerk) within group (order by bi)
from crossed
group by o_orderkey, ai
order by o_orderkey, max(bi)

请注意,我必须在原来的问题中添加一个index字段,这样我们就可以在行中有一个明确的排序顺序。

92vpleto

92vpleto4#

使用ARRAY_AGGARRAY_SLICE。灵感源自Rajat's answer

SELECT *
   ,IFF(ROW_NUMBER() OVER(PARTITION BY o_orderkey ORDER BY INDEX) <= 4, 0, 
        ROW_NUMBER() OVER(PARTITION BY o_orderkey ORDER BY INDEX)-4) AS start_index
   ,IFF(ROW_NUMBER() OVER(PARTITION BY o_orderkey ORDER BY INDEX) <= 4, 
        ROW_NUMBER() OVER(PARTITION BY o_orderkey ORDER BY INDEX),4) AS num_elem

   ,ARRAY_SLICE(
         ARRAY_AGG(o_clerk) WITHIN GROUP (ORDER BY INDEX)
                            OVER(PARTITION BY o_orderkey)
         ,start_index
         ,start_index + num_elem) 
FROM sample_data
ORDER BY O_ORDERKEY, INDEX;

输出量:

start_index和num_elem计算可以被进一步简化/一般化以仿真窗口滑动帧ROWS BETWEEN PRECEDING prec AND FOLLOWING foll

SELECT *
   ,ROW_NUMBER() OVER(PARTITION BY o_orderkey ORDER BY INDEX) AS rn
   ,3 AS prec
   ,0 AS foll
   ,ARRAY_SLICE(
         ARRAY_AGG(o_clerk) WITHIN GROUP (ORDER BY INDEX) 
                            OVER(PARTITION BY o_orderkey)
         ,IFF(rn <= prec+1, 0, rn-(prec+1))
         ,IFF(rn <= prec+1, 0, rn-(prec+1)) + IFF(rn <= prec+1, rn+foll,prec+1+foll)
   ) 
FROM sample_data
ORDER BY O_ORDERKEY, INDEX;

显示用于调试的ARRAY_SLICE的每个参数:

相关问题