选择所有日期、所有客户以及客户在给定日期hive上的最后一个交易日期

nr9pn0ug  于 2021-06-26  发布在  Hive
关注(0)|答案(1)|浏览(379)

我在配置单元上有3个表:-日历表(所有日期都在给定的时间段内)-客户表-客户事务列表
我需要加入这些,以便得到一个给定的日期,所有客户和他们的最后一笔交易,直到那个日期。只有在该日期之前没有任何事务(我的意思是,在当前日历记录之前的最后一个事务)时,最后一个事务才应该为空。
日历示例:

+----------+
|date      |
+----------+
|2017-06-01|
|2017-06-02|
|2017-06-03|
|2017-06-04|
|2017-06-05|
|2017-06-06|
|2017-06-07|
|2017-06-08|
|2017-06-09|
|2017-06-10|
+----------+

客户样品:

+------------+
|customer_id |
+------------+
|11544049690 |
|15506698252 |
|67015354024 |
|43622453087 |
|509         |
|42859528435 |
|506         |
|10669246896 |
|33355892704 |
|500         |
+------------+

事务处理示例:

+------------+----------+
|customer_id |trx_date  |
+------------+----------+
|43622453087 |2018-05-30|
|509         |2017-10-04|
|509         |2018-01-09|
|509         |2017-11-07|
|509         |2018-01-30|
|506         |2017-10-04|
|506         |2017-12-21|
|506         |2017-11-07|
|506         |2017-11-07|
|500         |2017-10-04|
+------------+----------+

结果大致如下:

+----------+------------+--------------+
|date      |customer_id |last_trx_date |
+----------+------------+--------------+
|2017-10-04|11544049690 |              |
|2017-10-04|15506698252 |              |
|2017-10-04|67015354024 |              |
|2017-10-04|43622453087 |              |
|2017-10-04|509         |2017-10-04    |
|2017-10-04|42859528435 |              |
|2017-10-04|506         |2017-10-04    |
|2017-10-04|10669246896 |              |
|2017-10-04|33355892704 |              |
|2017-10-04|500         |2017-10-04    |
|2017-10-05|11544049690 |              |
|2017-10-05|15506698252 |              |
|2017-10-05|67015354024 |              |
|2017-10-05|43622453087 |              |
|2017-10-05|509         |2017-10-04    |
|2017-10-05|42859528435 |              |
|2017-10-05|506         |2017-10-04    |
|2017-10-05|10669246896 |              |
|2017-10-05|33355892704 |              |
|2017-10-05|500         |2017-10-04    |
|2017-10-06|11544049690 |              |
|2017-10-06|15506698252 |              |
|2017-10-06|67015354024 |              |
|2017-10-06|43622453087 |              |
|2017-10-06|509         |2017-10-04    |
|2017-10-06|42859528435 |              |
|2017-10-06|506         |2017-10-04    |
|2017-10-06|10669246896 |              |
|2017-10-06|33355892704 |              |
|2017-10-06|500         |2017-10-04    |
.
.
.
|2017-11-07|11544049690 |              |
|2017-11-07|15506698252 |              |
|2017-11-07|67015354024 |              |
|2017-11-07|43622453087 |              |
|2017-11-07|509         |2017-11-07    |
|2017-11-07|42859528435 |              |
|2017-11-07|506         |2017-11-07    |
|2017-11-07|10669246896 |              |
|2017-11-07|33355892704 |              |
|2017-11-07|500         |2017-10-04    |
+----------+------------+--------------+

最后一次尝试是这样的:这是最后一次尝试:

SELECT

    cal.date as calendar_date,
    c.customer_id,
    to_date(trx.tstamp) as trx_date,
    max(to_date(trx.tstamp)) over (
        order by trx.date, trx.customer_id rows unbounded preceding) as last_trx
    FROM
       calendartable cal
    LEFT JOIN customer t1
    LEFT JOIN transactions t2          
    ON (c.customer_id == trx.customer_id) 

    WHERE to_date(cal.date) <= current_date or cal.date is null
osh3o9ms

osh3o9ms1#

需要交叉联接来为每个客户的日历表中的每个日期生成行。然后是 left join 使用聚合将产生所需的结果。

SELECT cal.date as calendar_date,
cst.customer_id,
max(to_date(trx.tstamp)) as last_trx
FROM calendartable cal
CROSS JOIN customer cst
LEFT JOIN transactions trx ON cst.customer_id = trx.customer_id AND trx.tstamp <= cal.dt
WHERE to_date(cal.date) <= current_date 
GROUP BY cal.date,cst.customer_id

相关问题