使用窗口函数计算hive中的滚动周开销

axzmvihb  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(446)

我需要制定一份客户一周花费的分配表。每次有顾客买东西,我都想知道他们在过去一周里花了多少钱。我想用我的Hive代码来做这个。
我的数据集与此类似:
花销表

Cust_ID | Purch_Date | Purch_Amount  
1 | 1/1/19 | $10  
1 | 1/2/19 | $21  
1 | 1/3/19 | $30  
1 | 1/4/19 | $11  
1 | 1/5/19 | $21  
1 | 1/6/19 | $31  
1 | 1/7/19 | $41  
2 | 1/1/19 | $12  
2 | 1/2/19 | $22  
2 | 1/3/19 | $32  
2 | 1/5/19 | $42  
2 | 1/7/19 | $52  
2 | 1/9/19 | $62  
2 | 1/11/19 | $72

到目前为止,我已经尝试过类似的代码: Select Cust_ID, Purch_Date, Purch_Amount, sum(Purch_Amount) over (partition by Cust_ID order by unix_timestamp(Purch_Date) range between 604800 and current row) as Rolling_Spend from Spend_Table ```
Cust_ID | Purch_Date | Purch_Amount | Rolling_Spend
1 | 1/1/19 | $10 | $10
1 | 1/2/19 | $21 | $31
1 | 1/3/19 | $30 | $61
1 | 1/4/19 | $11 | $72
1 | 1/5/19 | $21 | $93
1 | 1/6/19 | $31 | $124
1 | 1/7/19 | $41 | $165
2 | 1/1/19 | $12 | $12
2 | 1/2/19 | $22 | $34
2 | 1/3/19 | $32 | $66
2 | 1/5/19 | $42 | $108
2 | 1/7/19 | $52 | $160
2 | 1/9/19 | $62 | $188
2 | 1/11/19 | $72 | $228

我相信问题出在我的范围之间,因为它似乎抓住了前面的行数。我希望它能在前几秒内抓取数据(604800是6天的秒数)。
我想做的事可行吗?我不能做前面的6行,因为不是每个客户都像客户2那样每天购买。非常感谢您的帮助!
更新:通过将原始代码更改为: `Select Cust_ID, Purch_Date, Purch_Amount, sum(Purch_Amount) over (partition by Cust_ID order by unix_timestamp(Purch_Date, 'MM-dd-yyyy') range between 604800 and current row) as Rolling_Spend from Spend_Table` 关键是在unix\u时间戳公式中指定日期格式
kq4fsx7k

kq4fsx7k1#

SELECT *, sum(some_value) OVER (
        PARTITION BY Cust_ID 
        ORDER BY CAST(Purch_Date AS timestamp) 
        RANGE BETWEEN INTERVAL 7 DAYS PRECEDING AND CURRENT ROW
     ) AS cummulativeSum FROM Spend_Table

https://cwiki.apache.org/confluence/display/hive/languagemanual+windowingandanalytics

相关问题