我正在尝试修改这个基于group by计算空值的解决方案(第一个答案),唯一的区别是我想要它们的百分比(例如,2016年第1列的30%为空),而不是计数(例如,2016年第1列的6521为空)。我的问题:
WITH t1nulltest AS
( select date_column
,SUM(IF(c1 IS NULL,1,0))/count(*) OVER (PARTITION BY date_column) as c1null
,SUM(IF(c2 IS NULL,1,0))/count(*) OVER (PARTITION BY date_column) as c2null
,SUM(IF(c3 IS NULL,1,0))/count(*) OVER (PARTITION BY date_column) as c3null
,SUM(IF(c4 IS NULL,1,0))/count(*) OVER (PARTITION BY date_column) as c4null
,SUM(IF(c5 IS NULL,1,0))/count(*) OVER (PARTITION BY date_column) as c5null
,row_number() OVER (PARTITION BY date_column) as rowno
from t1)
select
date_column, c1null, c2null,c3null,c4null,c5null from t1nulltest
where rowno =1;
与原始解决方案的唯一区别是 /count(*)
,但这行不通,我想知道为什么。原始查询工作正常。我的查询给出了错误:
Error while compiling statement: FAILED: SemanticException Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Also check for circular dependencies. Underlying error: org.apache.hadoop.hive.ql.parse.SemanticException: line 2:68 Expression not in GROUP BY key 'date_column'
1条答案
按热度按时间dnph8jn41#
我怀疑整个逻辑可以简化为一个简单的聚合查询。
这也可以用
avg()
: