分组时间内按案例计数

dgiusagp  于 2021-06-28  发布在  Hive
关注(0)|答案(3)|浏览(332)

我试图按versiontype统计每周出现在数据中的不同id,但我不确定如何正确构建查询。
我希望制作一张table,大致如下:

1.1     1.2     1.3    1.4
wk1     1       5       4      8
wk2     4       3       9      8
wk3     1       8       0      6

我尝试创建下面的查询,但它无法运行,因为它需要group by中的case语句,而group by则不接受count()。

SELECT
  Case  when version like "1.1%" then Count(distinct ID)
     when version like "1.2%" then Count(distinct ID)
     when version like "1.3%" then Count(distinct ID)
     when version like "1.4%" then Count(distinct ID) end,
  CAST(((datediff(timestamp_pst,'2016-01-03') / 7)+1) as INT) as week_of_the_year
  FROM db.table
  where timestamp_pst >=  "2016-01-28"
  group by CAST(((datediff(timestamp_pst,'2016-01-03') / 7)+1) as INT)  
        order by week_of_the_year
s3fp2yjn

s3fp2yjn1#

SELECT
    COUNT(DISTINCT (CASE WHEN version like '1.1%' THEN ID END)) as '1.1'
    ,COUNT(DISTINCT (CASE WHEN version like '1.2%' THEN ID END)) as '1.2'
    ,COUNT(DISTINCT (CASE WHEN version like '1.3%' THEN ID END)) as '1.3'
    ,COUNT(DISTINCT (CASE WHEN version like '1.4%' THEN ID END)) as '1.4'
  CAST(((datediff(timestamp_pst,'2016-01-03') / 7)+1) as INT) as week_of_the_year
  FROM aws_d3.iaanalytics_detail
  where timestamp_pst >=  "2016-01-28"
  group by CAST(((datediff(timestamp_pst,'2016-01-03') / 7)+1) as INT)  
        order by week_of_the_year

您希望使用“条件聚合”。这样做,case语句实际上进入聚合函数内部。因为你想 COUNT(DISTINCT) 实际上,您需要使用 DISTINCT 聚合中的关键字,或者通过生成一个派生表,以便只显示不同的值,正如另一个答案所示,但它将使您免于重复的唯一单词是 DISTINCT 我认为没有必要使用派生表来使问题复杂化。
请注意 SUM(CASE WHEN blah THEN 1 ELSE 0 END) 将不适用于您,因为这将对所有事件求和,而不计算不同的值。聚合函数也会忽略空值,如果不包含 ELSE 语句大小写表达式的值 NULL 如果不匹配。

ghhaqwfi

ghhaqwfi2#

你可以用 COUNT() 带条件的聚合函数 CASE 声明。

SELECT
    week_of_the_year
  , COUNT(CASE WHEN version LIKE '1.1%' THEN id END) AS v1_1
  , COUNT(CASE WHEN version LIKE '1.2%' THEN id END) AS v1_2
  , COUNT(CASE WHEN version LIKE '1.3%' THEN id END) AS v1_3
  , COUNT(CASE WHEN version LIKE '1.4%' THEN id END) AS v1_4
FROM (
  SELECT
    DISTINCT
      id
    , version
    , CAST(((datediff(timestamp_pst,'2016-01-03') / 7)+1) as INT) as week_of_the_year
  FROM aws_d3.iaanalytics_detail
  where timestamp_pst >= '2016-01-28'
  ) t
GROUP BY week_of_the_year
ORDER BY week_of_the_year

请注意 DISTINCT 部分查询发生在派生表中 t . 实际上不需要派生表,但我发现它是一个更干净的解决方案,因为 GROUP BY 子句不会重复相同的代码,从而使其更具可读性。这还引入了不在聚合中完成的独特部分。

raogr8fs

raogr8fs3#

试试这个

SELECT
  SUM(Case  when version like "1.1%" then 1 ELSE 0 END) as '1.1',
  SUM(Case  when version like "1.2%" then 1 ELSE 0 END) as '1.2',
  SUM(Case  when version like "1.3%" then 1 ELSE 0 END) as '1.3', 
  SUM(Case  when version like "1.4%" then 1 ELSE 0 END) as '1.4',
  CAST(((datediff(timestamp_pst,'2016-01-03') / 7)+1) as INT) as week_of_the_year
  FROM aws_d3.iaanalytics_detail
  where timestamp_pst >=  "2016-01-28"
  group by CAST(((datediff(timestamp_pst,'2016-01-03') / 7)+1) as INT)  
        order by week_of_the_year

相关问题