hive查询

vwkv1x7d  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(456)

我在 hive 下面的table上

accountNum  date  status  action qty time
    ----------  ----  ------  ------ --- ----
    1234        2017   filled  B      10  11:20
    1234        2017   filled  S      10  11:20
    2345        2017   filled  B      20  12:00
    2345        2017   filled  B      10  12:00
    4444        2017   filled  B       5  01:00
    4444        2017   filled  S       5  02:00

在这里,我想比较两行与行动“b”,然后行动“s”。如果在这些记录中发现两行的第一行是b,然后是s,我必须检查accountnum,date,time,status是否相同。
所以基于上面的测试数据,我应该只得到前2行

accountNum  date  status  action qty time
----------  ----  ------  ------ --- ----
1234        2017   filled  B      10  11:20
1234        2017   filled  S      10   11:20

对于这个问题,我应该写什么类型的查询?
我有如下mysql查询,但是hive不支持/distinct/count,所以它在hive中不起作用。。是否有任何方法可以使用have或join和write查询?

select  t1.*
from    yourTable t1
join    (
            select  accountNum, date, status, time
            from    yourTable
            where   action in ('B', 'S')
            group by accountNum, date, status, time
            having  count(distinct action) = 2
        ) t2
on      t1.accountNum = t2.accountNum and
        t1.date = t2.date and
        t1.status = t2.status and
        t1.time = t2.time
i5desfxk

i5desfxk1#

1. date 是一个保留字。

2.

在英语中使用表达式似乎有局限性 HAVING 如果它们没有出现在 SELECT 条款。
此查询(基于您的原始查询)适用于:

select  t1.*
from    yourTable t1
join    (
            select  accountNum, date, status, time,count(distinct action)
            from    yourTable
            where   action in ('B', 'S')
            group by accountNum, `date`, status, time
            having  count(distinct action) = 2
        ) t2
on      t1.accountNum = t2.accountNum and
        t1.`date` = t2.`date` and
        t1.status = t2.status and
        t1.time = t2.time
+------------+------+--------+--------+-----+-------+
| accountnum | date | status | action | qty | time  |
+------------+------+--------+--------+-----+-------+
|       1234 | 2017 | filled | B      |  10 | 11:20 |
|       1234 | 2017 | filled | S      |  10 | 11:20 |
+------------+------+--------+--------+-----+-------+

下面是另一个基于windows函数的解决方案

select  accountnum,`date`,status,action,qty,time

from   (select  *
               ,max(case when action = 'B' then 1 end) over w as b_flag
               ,max(case when action = 'S' then 1 end) over w as s_flag

        from    yourTable

        where   action in ('B', 'S')

        window  w as (partition by  accountNum, `date`, status, time)
        ) t

where   b_flag = 1
    and s_flag = 1
;
+------------+------+--------+--------+-----+-------+
| accountnum | date | status | action | qty | time  |
+------------+------+--------+--------+-----+-------+
|       1234 | 2017 | filled | B      |  10 | 11:20 |
|       1234 | 2017 | filled | S      |  10 | 11:20 |
+------------+------+--------+--------+-----+-------+

相关问题