在mysql中通过2列拆分列值

yv5phkfx  于 2023-02-07  发布在  Mysql
关注(0)|答案(1)|浏览(152)

我有以下数据集:

timestamp   converationId   UserId  MessageId       tpMessage   Message 
1614578324  ceb9004ae9d3    1c376ef 5bbd34859329    question    Where do you live?
1614578881  ceb9004ae9d3    1c376ef d3b5d3884152    answer      Brooklyn
1614583764  ceb9004ae9d3    1c376ef 0e4501fcd61f    question    What's your name?
1614590885  ceb9004ae9d3    1c376ef 97d841b79ff7    answer      Phill
1614594952  ceb9004ae9d3    1c376ef 11ed3fd24767    question    What's your gender?
1614602036  ceb9004ae9d3    1c376ef 601538860004    answer      Male
1614602581  ceb9004ae9d3    1c376ef 8bc8d9089609    question    How old are you?
1614606219  ceb9004ae9d3    1c376ef a2bd45e64b7c    answer      35

我想拆分timestap列,一个是问题的时间戳,另一个是答案的时间戳
为此,我使用下面的case语句创建了一个新表(步骤2):

case when tpMessage = 'question' 
    then timestamp
    end  ts_question
,case when tpMessage = 'answer'
    then ts_timestamp
    end  ts_answer

我尝试了许多不同形式的内连接和左连接,但都没有得到我想要的结果,如下所示:

ts_question ts_answer   converationId   UserId
1614578324  1614578881  ceb9004ae9d3    1c376ef
1614583764  1614590885  ceb9004ae9d3    1c376ef
1614594952  1614602036  ceb9004ae9d3    1c376ef
1614602581  1614606219  ceb9004ae9d3    1c376ef

有了这个关键字,我就可以转到我的第一个表,得到消息列,并将它转换成两个新列,问题和答案,我的最终输出将是:

ts_question ts_answer   converationId   UserId      Question            Answer
1614578324  1614578881  ceb9004ae9d3    1c376ef     What's your name?   Phill   
1614583764  1614590885  ceb9004ae9d3    1c376ef     Where do you live?  Brooklyn
1614594952  1614602036  ceb9004ae9d3    1c376ef     How old are you?    35  
1614602581  1614606219  ceb9004ae9d3    1c376ef     What's your gender? Male

我很高兴你们能帮我完成第二步。
谢谢

ddrv8njm

ddrv8njm1#

如果问题和答案总是在用户的对话中适当地交错,那么我们可以使用窗口函数枚举这两个系列,然后透视:

select 
    max(case when tpMmessage = 'question' then timestamp end) ts_question,
    max(case when tpMmessage = 'answer'   then timestamp end) ts_answer,
    conversationId
    userId, 
    max(case when tpMmessage = 'question' then message end) question,
    max(case when tpMmessage = 'answer'   then message end) answer,
from (
    select t.*, 
        row_number() over(partition by userId, conversationId order by timestamp) rn
    from mytable t
) t
group by userId, conversationId, rn

如果问题或答案中存在空白或孤岛,那么这个方法就不起作用了。下面是一个使用窗口函数为每个问题(如果有的话)提供第一个答案的方法:

select 
    timestamp ts_question,
    case when lead_tp = 'answer' then lead_timestamp end ts_answer,
    conversationId,
    userId,
    message question,
    case when lead_tp = 'answer' then lead_message end answer
from (
    select t.*, 
        lead(tpMessage) over(partition by userId, conversationId order by timestamp) lead_tp,
        lead(timestamp) over(partition by userId, conversationId order by timestamp) lead_timestamp,
        lead(message) over(partition by userId, conversationId order by timestamp) lead_message,
    from mytable t
) t
where tpMessage = 'question'

相关问题