行数()over()语法的配置单元出现问题

laximzn5  于 2021-05-29  发布在  Hadoop
关注(0)|答案(2)|浏览(336)

我有Hive表(详情如下):

hive> select * from abcd ;
OK
a   1   1
b   2   2
a   3   3
Time taken: 0.261 seconds, Fetched: 3 row(s)
hive> desc abcd;
OK
val001                  string                                      
val002                  int                                         
val003                  int                                         
Time taken: 0.084 seconds, Fetched: 3 row(s)

我正在写以下查询,但收到以下错误:

select max(rnk) rnk, max(val) val, sum(cnt) cnt from (select val, count(*) cnt, row_number() over (order by case val  when null then 0 else count(*) end desc, val) rnk from (select VAL001 val from abcd ) group by val) group by case when rnk <= 100 or val is null then rnk else 100 + 1 end;

FAILED: ParseException line 3:55 missing ) at 'by' near 'by'
line 3:58 missing EOF at 'val' near 'by'

我正在寻找上述查询的以下结果:

RNK VAL                CNT
--- ------------------------------ ---
1   a                    2
2   b                    1

我能够从oracle数据库中实现同样的功能,该数据库具有类似的表。唯一的区别是我在oracledb中使用了orderbydecode而不是orderbycase,但是由于hive不支持decode,所以我不能这样做。
请查找正在运行的oracle db sql查询:

SQL> select max(rnk) rnk, max(val) val, sum(cnt) cnt from 
    (select val, count(*) cnt, row_number() over (order by 
    decode(val,null,0,count(*)) desc, val) rnk from (select VAL001 val from 
    table_name ) group by val)
    group by case when rnk <= 100 or val is null then rnk else 100 + 1 end;   

RNK VAL                CNT
--- ------------------------------ ---
 1 a                     2
 2 b                     1

有人能帮我解决Hive问题吗。如果你需要更多的细节,请告诉我。

oknrviil

oknrviil1#

这不是技术上更简单的解决方案,但可能更容易阅读:
第一个子查询执行计数和排序,
第二个子查询是 top 1 - top 100 以及 other (top) 以及 unknown .
最后一个查询进行分组。

with cnt as (
 select VAL001 val, 
  count(*)  as cnt, 
  row_number() over (order by decode(VAL001,null,0,count(*)) desc, VAL001) as rnk
 from  abcd
 group by VAL001),
ctg as (
 select 
  val, cnt, rnk,
  case when val is NULL then 'unknown'
       when rnk <= 100 then 'top '||rnk
       else 'other' end as category_code
 from cnt)
select 
  max(rnk) as rnk, max(val) as val, sum(cnt) as cnt
from  ctg
group by category_code
order by 1
q35jwt9p

q35jwt9p2#

这是你的问题。我想有一个更简单的方法可以得到你想要的:

select max(rnk) as rnk, max(val) as val, sum(cnt) as cnt
from (select val, count(*) as cnt,
             row_number() over (order by case val when null then 0 else count(*) end desc, val) as rnk
      from (select VAL001 val from abcd )
      group by val
     )
group by case when rnk <= 100 or val is null then rnk else 100 + 1 end;

我认为您只需要为中的子查询使用表别名 from 条款:

select max(rnk) as rnk, max(val) as val, sum(cnt) as cnt
from (select val, count(*) as cnt,
             row_number() over (order by case val when null then 0 else count(*) end desc, val) as rnk
      from (select VAL001 val from abcd
           ) x
      group by val
     ) x
group by case when rnk <= 100 or val is null then rnk else 100 + 1 end;

相关问题