sql—按日期配置单元计算单个组每月的事务数

axr492tv  于 2021-06-25  发布在  Hive
关注(0)|答案(1)|浏览(395)

我有一个客户事务表,其中客户购买的每个项目都存储为一行。因此,对于单个事务,表中可以有多行。我还有一个叫探望约会的上校。有一个名为cal\u month\u nbr的类别列,根据发生的月份从1到12不等。
数据如下所示

Id          visit_date     Cal_month_nbr
----        ------          ------
1           01/01/2020      1
1           01/02/2020      1
1           01/01/2020      1
2           02/01/2020      2
1           02/01/2020      2
1           03/01/2020      3
3           03/01/2020      3

首先,我想知道每个月有多少次客户访问他们的访问日期,即我希望下面的输出

id    cal_month_nbr       visit_per_month
---        ---------     ----
1           1             2
1           2             1
1           3             1
2           2             1
3           3             1

每个ids的平均访问频率是多少。

id            Avg_freq_per_month
----          -------------
1              1.33
2              1
3              1

我尝试了下面的查询,但它将每个项目作为一个事务进行计数

select avg(count_e) as num_visits_per_month,individual_id
from
(
    select r.individual_id, cal_month_nbr, count(*) as count_e
 from 
  ww_customer_dl_secure.cust_scan 
         GROUP  by 
         r.individual_id, cal_month_nbr
         order by count_e desc
         ) as t
         group by individual_id

如果您有任何帮助、指导或建议,我将不胜感激

pxiryf3j

pxiryf3j1#

您可以将总访问量除以月数:

select individual_id,
       count(*) / count(distinct cal_month_nbr)
from  ww_customer_dl_secure.cust_scan c
group by individual_id;

如果您想要每月的平均天数,那么:

select individual_id,
       count(distinct visit_date) / count(distinct cal_month_nbr)
from  ww_customer_dl_secure.cust_scan c
group by individual_id;

事实上,Hive在计算方面可能效率不高 count(distinct) ,因此多级聚合可能更快:

select individual_id, avg(num_visit_days)
from (select individual_id, cal_month_nbr, count(*) as num_visit_days
      from (select distinct individual_id, visit_date, cal_month_nbr
            from ww_customer_dl_secure.cust_scan c
           ) iv 
      group by individual_id, cal_month_nbr
     ) ic
group by individual_id;

相关问题