在配置单元中有一个包含以下数据的表。我正试着玩弄它:
A B C D
==============
76 5 0.6 107777
78 5 0.5 107777
79 5 0.5 107777
79 5 0.5 107777
80 5 0.5 107777
210 5 0.5 107777
211 5 0.5 107777
213 5 0.5 107777
316 5 0.5 107777
316 5 0.5 107777
76 7 0.5 102997
78 7 0.5 102997
79 8 0.5 102997
79 8 0.5 102997
80 9 0.5 108997
80 9 0.5 108997
80 9 0.5 108997
Need to count the 'B and D' when B>4 and C is not same for B and D.
预期o/p:
在这里查找值,其中,对于“a”和“b”中的相同值,“c”中的值不存在。同时显示表中存在的重复值(大于1)。
Value in 'C' is not present for the same value in 'A' and 'B':
这意味着
A B C
=====
76 5 0.6 => OK
78 5 0.5 => OK
79 5 0.5 => OK
79 5 0.5 => NOT OK (As C=0.5 shouldn't have repetitive value for same A and B value)
80 5 0.5 => OK.....
A B C D
==============
79 5 0.5 107777
316 5 0.5 107777
79 8 0.5 102997
80 9 0.5 108997
80 9 0.5 108997
Count: 5
能够为第一部分编写查询,但对其余部分没有任何突破:
SELECT A,B,C,D FROM DB.TABLE1 WHERE B >1;
但不知道如何写最后一部分:
count the 'B and D' when B>4 and C is not same for B and D.
任何关于这方面的建议都会很有帮助。
更新\u 1:
尝试了以下操作:
SELECT A,B,C,D FROM (SELECT * FROM TABLE1 WHERE B >4) t1 GROUP BY B,D HAVING countnum>1 LIMIT 20;
但是得到错误:
FAILED: SemanticException [Error 10025]: Line 1:197 Expression not in GROUP BY key '1'
hive>
1条答案
按热度按时间5kgi1eie1#
Need to count the 'B and D' when B>4 and C is not same for B and D.
输入:table1
```A B C D
76 5 0.6 107777
78 5 0.5 107777
79 5 0.5 107777
79 5 0.5 107777
80 5 0.5 107777
210 5 0.5 107777
211 5 0.5 107777
213 5 0.5 107777
316 5 0.5 107777
316 5 0.5 107777
76 7 0.5 102997
78 7 0.5 102997
79 8 0.5 102997
79 8 0.5 102997
80 9 0.5 108997
80 9 0.5 108997
80 9 0.5 108997
select count(*)
from (
select *, row_number() over (partition by B, C, D) as rn
from table1
where B>4
) as t1
where rn=1;
A B C D rn
76 5 0.6 107777 1
78 5 0.5 107777 1
79 5 0.5 107777 2
79 5 0.5 107777 3
80 5 0.5 107777 4
210 5 0.5 107777 5
211 5 0.5 107777 6
213 5 0.5 107777 7
316 5 0.5 107777 8
316 5 0.5 107777 9
76 7 0.5 102997 1
78 7 0.5 102997 2
79 8 0.5 102997 1
79 8 0.5 102997 2
80 9 0.5 108997 1
80 9 0.5 108997 2
80 9 0.5 108997 3