基于单独的groupby列并带有条件的计数

mepcadol  于 2021-06-26  发布在  Hive
关注(0)|答案(4)|浏览(353)

我尝试将三个独立的查询合并为一个查询,但仍然生成相同的结果,只是作为一个表。columna和columnb实际上都是“yyyy-mm-dd”的日期格式,理想情况下,最终结果只是一列日期和每个查询的单独计数。

select columnA, count(*)
from data.table
where timestamp between '2017-01-01' and '2017-01-07'
group by columnA

select columnB, count(*)
from data.table
where timestamp between '2017-01-01' and '2017-01-07'
group by columnB

select columnB, count(distinct columnC)
from data.table
where timestamp between '2017-01-01' and '2017-01-07'
and columnX in ('itemA','ItemB')
group by columnB
b1zrtrql

b1zrtrql1#

我可以使用以下方法使其工作:

With pullA as
(
  select columnA, count(*) as A_count
  from data.table
  group by columnA
),
pullB as
(
  select columnB, count(*) as B_count
  from data.table
  group by columnB
),

pullC as
(
  select columnB , count(*) as C_count
  from data.table
  where columnX in ('itemA', 'itemB')
  group by columnB
)

select ColumnB, A_count, B_count, C_count
from pullB
left join pullA
on ColumnB = ColumnA
left join pullC
on ColumnB = ColumnC

这种方法比联合或子查询方法效率高还是低?

ie3xauqp

ie3xauqp2#

与…同行 UNION ALL :

select columnA, count(*)
from data.table
where timestamp between '2017-01-01' and '2017-01-07'
group by columnA
UNION ALL
select columnB, count(*)
from data.table
where timestamp between '2017-01-01' and '2017-01-07'
group by columnB
UNION ALL
select columnB, count(distinct columnC)
from data.table
where timestamp between '2017-01-01' and '2017-01-07'
and columnX in ('itemA','ItemB')
group by columnB
r1zhe5dt

r1zhe5dt3#

以下是您想要的:

select columnA, count(*) as cnt from data.table where timestamp between '2017-01-01' and '2017-01-07' group by columnA
Union All
select columnB, count(*) as cnt from data.table where timestamp between '2017-01-01' and '2017-01-07' group by columnB
Union All
select columnB, count(distinct columnC) as cnt from data.table where timestamp between '2017-01-01' and '2017-01-07' and columnX in ('itemA','ItemB') group by columnB
myss37ts

myss37ts4#

以下查询表示您要执行的操作:

select d.dte, coalesce(a.cnt, 0) as acnt, coalesce(b.cnt, 0) as bcnt,
       b.c_cnt
from (select columnA as dte from data.table where timestamp between '2017-01-01' and '2017-01-07'

      union
      select columnB from data.table where timestamp between '2017-01-01' and '2017-01-07'
     ) d left join
     (select columnA, count(*) as cnt
      from data.table
      where timestamp between '2017-01-01' and '2017-01-07'
      group by columnA
     ) a
     on d.dte = a.columnA left join
     (select columnB, count(*) as cnt,
             count(distinct case when columnX in ('itemA','ItemB') then columnC end) as c_cnt
      from data.table
      where timestamp between '2017-01-01' and '2017-01-07'
      group by columnB
     ) b
     on d.dte = b.columnB;

我认为这是hive兼容的,但是偶尔hive与sql的其他方言有着惊人的差异。

相关问题