我有一个巨大的段落Hive查询代码如下:
select
count(distinct case when click_day between ${hiveconf:dt_180} and ${hiveconf:dt_end} and recommend_flag=1 then productid else null end) as unique_hk_products_cnt_180d,
count(distinct case when click_day between ${hiveconf:dt_90} and ${hiveconf:dt_end} and recommend_flag=1 then productid else null end) as unique_hk_products_cnt_90d,
count(distinct case when click_day between ${hiveconf:dt_30} and ${hiveconf:dt_end} and recommend_flag=1 then productid else null end) as unique_hk_products_cnt_30d,
count(distinct case when click_day between ${hiveconf:dt_15} and ${hiveconf:dt_end} and recommend_flag=1 then productid else null end) as unique_hk_products_cnt_15d,
count(distinct case when click_day between ${hiveconf:dt_7} and ${hiveconf:dt_end} and recommend_flag=1 then productid else null end) as unique_hk_products_cnt_7d
from mytable ;
这些字段之间的唯一区别是天数,它表示时间窗口的长度。这使得我的问题非常大,很难出错。
dt\u 15只是之前定义的字符串变量:
set dt_15 = CONCAT(SUBSTRING(date_sub(current_date,15), 1, 4), SUBSTRING(date_sub(current_date,15), 6, 2), SUBSTRING(date_sub(current_date,15), 9, 2));
有人能帮我把它重建得更简单吗?比如在新表中使用循环生成字段?
谢谢。
2条答案
按热度按时间yyyllmsg1#
试试这个:使用内置日期函数
因为这将删除concat和substring操作。
这将减少输入量。你也可以把
click_day < dt_end
如果对所有列都相同,则删除中间的。qnzebej02#
试试这个
附笔
你为什么要用非标准格式存储日期?