在hive表中生成前导值

z9ju0rcb  于 2021-07-09  发布在  Spark
关注(0)|答案(1)|浏览(337)

我有两个表格,如下面的cust\u msg和cust\u audit。

cust_msg
+-----------+----------+------------+----------+                                        
|cust_id    |first_name|progress    |process_dt|
+-----------+----------+------------+----------+
|106674     |Charley   |Initiate    |20210202  |
|106674     |Charley   |Review      |20210203  |
|106674     |Charley   |Realign     |20210204  |
|106674     |Charley   |Approved    |20210211  |
|106674     |Charley   |Installation|20210216  |
|106674     |Charley   |Survey      |20210323  |
+-----------+----------+------------+----------+

cust_audit
+-----------+----------+----------------------------+----------+                                        
|cust_id    |agent_id  |adt_log                     |process_dt|
+-----------+----------+----------------------------+----------+
|106674     |602       |Promotional Offer sent      |20210112  |
|106674     |602       |Click Open Promo            |20210113  |
|106674     |100       |Promo Inquiry               |20210114  |
|106674     |100       |Cust Waiting                |20210118  |
|106674     |100       |Customer Application        |20210119  |
|106674     |602       |Appl Approved               |20210122  |
|106674     |602       |Initiate Appl               |20210201  |
|106674     |602       |Sale Initiated              |20210202  |
|106674     |602       |sale Rv Pending             |20210203  |
|106674     |602       |sales-cust Realign          |20210203  |
|106674     |602       |cust in aggrement           |20210204  |
|106674     |602       |Sales Dep Approve           |20210208  |
|106674     |602       |mgt Approved                |20210211  |
|106674     |602       |Installation pending        |20210216  |
|106674     |602       |Cust Survey                 |20210323  |
+-----------+----------+----------------------------+----------+

我需要再建一张table cust_del_detail 如下所示 progress 应该是 Initiate 直到 process_dt = 20210202 作为 progress 在表中 cust_msgInitiate 20210202年, progress 应该是 Review20210203 & progress 应该是 Realign2021020420210208 &同样地 Approved20210211 & Installation20210216 & Survey20210323 ```
cust_del_detail
+-----------+--------------+----------+---------------------------+---------------+----------+
|cust_id |first_name |agent_id |adt_log |progress |process_dt|
+-----------+--------------+----------+---------------------------+---------------+----------+
|106674 |Charley |602 |Promotional Offer sent |Initiate |20210112 |
|106674 |Charley |602 |Click Open Promo |Initiate |20210113 |
|106674 |Charley |100 |Promo Inquiry |Initiate |20210114 |
|106674 |Charley |100 |Cust Waiting |Initiate |20210118 |
|106674 |Charley |100 |Customer Application |Initiate |20210119 |
|106674 |Charley |602 |Appl Approved |Initiate |20210122 |
|106674 |Charley |602 |Initiate Appl |Initiate |20210201 |
|106674 |Charley |602 |Sale Initiated |Initiate |20210202 |
|106674 |Charley |602 |sale Rv Pending |Review |20210203 |
|106674 |Charley |602 |sales-cust Realign |Review |20210203 |
|106674 |Charley |602 |cust in aggrement |Realign |20210204 |
|106674 |Charley |602 |Sales Dep Approve |Realign |20210208 |
|106674 |Charley |602 |mgt Approved |Approved |20210211 |
|106674 |Charley |602 |Installation pending |Installation |20210216 |
|106674 |Charley |602 |Cust Survey |Survey |20210323 |
+-----------+--------------+----------+---------------------------+---------------+----------+

我尝试在Hive中使用铅窗口功能,但无法实现它。在hive或pyspark中实现这一点的最佳方法是什么?
mqkwyuun

mqkwyuun1#

你可以用 lead 然后加入:

from pyspark.sql import functions as F, Window

cust_del_detail = cust_msg.withColumn(
    'lead_dt', 
    F.lead('process_dt').over(Window.partitionBy('cust_id').orderBy('process_dt'))
).alias('cust_msg').join(
    cust_audit.alias('cust_audit'), 
    F.expr('''
        (progress != "Initiate" 
         and (cust_audit.process_dt < cust_msg.lead_dt or cust_msg.lead_dt is null) 
         and (cust_audit.process_dt >= cust_msg.process_dt)
        ) 
        or 
        (progress = "Initiate" 
         and (cust_audit.process_dt <= cust_msg.process_dt)
        )
    '''),
    'right'
).selectExpr(
    'cust_msg.cust_id', 
    'cust_msg.first_name', 
    'cust_audit.agent_id', 
    'cust_audit.adt_log', 
    'cust_msg.progress', 
    'cust_audit.process_dt'
)

结果:

cust_del_detail.show(truncate=False)
+-------+----------+--------+----------------------+------------+----------+
|cust_id|first_name|agent_id|adt_log               |progress    |process_dt|
+-------+----------+--------+----------------------+------------+----------+
|106674 |Charley   |602     |Promotional Offer sent|Initiate    |20210112  |
|106674 |Charley   |602     |Click Open Promo      |Initiate    |20210113  |
|106674 |Charley   |100     |Promo Inquiry         |Initiate    |20210114  |
|106674 |Charley   |100     |Cust Waiting          |Initiate    |20210118  |
|106674 |Charley   |100     |Customer Application  |Initiate    |20210119  |
|106674 |Charley   |602     |Appl Approved         |Initiate    |20210122  |
|106674 |Charley   |602     |Initiate Appl         |Initiate    |20210201  |
|106674 |Charley   |602     |Sale Initiated        |Initiate    |20210202  |
|106674 |Charley   |602     |sale Rv Pending       |Review      |20210203  |
|106674 |Charley   |602     |sales-cust Realign    |Review      |20210203  |
|106674 |Charley   |602     |cust in aggrement     |Realign     |20210204  |
|106674 |Charley   |602     |Sales Dep Approve     |Realign     |20210208  |
|106674 |Charley   |602     |mgt Approved          |Approved    |20210211  |
|106674 |Charley   |602     |Installation pending  |Installation|20210216  |
|106674 |Charley   |602     |Cust Survey           |Survey      |20210323  |
+-------+----------+--------+----------------------+------------+----------+

相关问题