I have a query:
select y, m, count(distinct id) as y_m_cnt, *other columns* from tab group by y, m, *other columns*
),
t2 as (
select y, count(distinct id) as y_cnt, *other columns* from tab group by y, *other columns*
)
select * t1.y, t1.m, t1.y_m_cnt, t2.y_cnt, t1.*other columns* from t1 left join t2 on t1.y=t2.y, t1.*other columns* = t2.*other columns*
Where y is a year and m is a month. The idea is that i want to count unique ids in two different aggregations, which different only in period (one is year + month, another is year only). The code above is working and almost ok, but i dislike it. Is it possible to re-write this code and make it shorter?
P.S. This code is simplified. In reality it has much more columns, so i need to select same columns twice and then i have terribly big join condition (query is more than 100 lines). That's why I'm looking for simplifying it. Found some solutions with partition but they don't cover problem of unique ids.
Edit: Hoped it would work without full query, but seems not. The problem is I need year_count only because of aggregating without mm, dd. Is it possible to make this query shorter?
with full_count as (
select param1,
yy,
mm,
dd,
param2,
param3,
param4,
count(distinct id1) as cnt_id1,
count(distinct id2) as cnt_id2,
count(distinct id3) as cnt_id3,
cast(null as string) as param5,
cast(null as string) as param6,
cast(null as string) as param7,
cast(null as string) as param8,
param9,
param10,
param11,
param12,
param13,
param14,
param15,
param16,
param17,
param18,
param19,
param20,
param21,
param22
from table
group by param1,
yy,
mm,
dd,
param2,
param3,
param4,
param9,
param10,
param11,
param12,
param13,
param14,
param15,
param16,
param17,
param18,
param19,
param20,
param21,
param22
),
year_count as (
select param1,
yy,
param2,
param3,
param4,
count(distinct id1) as cnt_id1_yy,
count(distinct id2) as cnt_id2_yy,
count(distinct id3) as cnt_id3_yy,
cast(null as string) as param5,
cast(null as string) as param6,
cast(null as string) as param7,
cast(null as string) as param8,
param9,
param10,
param11,
param12,
param13,
param14,
param15,
param16,
param17,
param18,
param19,
param20,
param21,
param22
from table
group by param1,
yy,
param2,
param3,
param4,
param9,
param10,
param11,
param12,
param13,
param14,
param15,
param16,
param17,
param18,
param19,
param20,
param21,
param22
)
select fc.param1 as param1,
fc.yy as yy,
fc.mm as mm,
fc.dd as dd,
fc.param2 as param2,
fc.param3 as param3,
fc.param4 as param4,
fc.cnt_id1 as cnt_id1,
fc.cnt_id2 as cnt_id2,
fc.cnt_id3 as cnt_id3,
yc.cnt_id1_yy as cnt_id1_yy,
yc.cnt_id2_yy as cnt_id2_yy,
yc.cnt_id3_yy as cnt_id3_yy,
fc.param5 as param5,
fc.param6 as param6,
fc.param7 as param7,
fc.param8 as param8,
fc.param9 as param9,
fc.param10 as param10,
fc.param11 as param11,
fc.param12 as param12,
fc.param13 as param13,
fc.param14 as param14,
fc.param15 as param15,
fc.param16 as param16,
fc.param17 as param17,
fc.param18 as param18,
fc.param19 as param19,
fc.param20 as param20,
fc.param21 as param21,
fc.param22 as param22
from full_count fc
left join year_count yc on
coalesce(fc.param1, 'tmp') = coalesce(yc.param1, 'tmp') and
coalesce(fc.yy, 'tmp') = coalesce(yc.yy, 'tmp') and
coalesce(fc.param2, 'tmp') = coalesce(yc.param2, 'tmp') and
coalesce(fc.param3, 'tmp') = coalesce(yc.param3, 'tmp') and
coalesce(fc.param4, 'tmp') = coalesce(yc.param4, 'tmp') and
coalesce(fc.param6, 'tmp') = coalesce(yc.param6, 'tmp') and
coalesce(fc.param9, 'tmp') = coalesce(yc.param9, 'tmp') and
coalesce(fc.param10, 'tmp') = coalesce(yc.param10, 'tmp') and
coalesce(fc.param11, 'tmp') = coalesce(yc.param11, 'tmp') and
coalesce(fc.param12, 'tmp') = coalesce(yc.param12, 'tmp') and
coalesce(fc.param13, 'tmp') = coalesce(yc.param13, 'tmp') and
coalesce(fc.param14, 'tmp') = coalesce(yc.param14, 'tmp') and
coalesce(fc.param15, 'tmp') = coalesce(yc.param15, 'tmp') and
coalesce(fc.param16, 'tmp') = coalesce(yc.param16, 'tmp') and
coalesce(fc.param17, 'tmp') = coalesce(yc.param17, 'tmp') and
coalesce(fc.param18, 'tmp') = coalesce(yc.param18, 'tmp') and
coalesce(fc.param19, 'tmp') = coalesce(yc.param19, 'tmp') and
coalesce(fc.param20, 'tmp') = coalesce(yc.param20, 'tmp') and
coalesce(fc.param21, 'tmp') = coalesce(yc.param21, 'tmp') and
coalesce(fc.param22, 'tmp') = coalesce(yc.param22, 'tmp')
1条答案
按热度按时间lyr7nygr1#
您不必指定正在使用的数据库,但许多数据库都支持
count(distinct)
作为窗口函数:在窗口函数中不支持
distinct
的数据库中有一些变通方法。