统计多个表中的公共用户数

deyfvvtc  于 2021-06-24  发布在  Hive
关注(0)|答案(2)|浏览(384)

我有4张表,如下所示

我基本上想知道表1中有多少用户在表2、3和4中。同样,对于表2,我想得到表1、3和4中有多少用户。表3和表4也是如此
基本上所有可能的组合。我想要的最终结果如下

我试图解决的方法之一就是 left-jointable1 后面还有其他表格 count 获取输出的第一行。但对所有可能的组合都这样做并不是最佳的。我在寻找其他可能的选择
我的代码是一样的

SELECT 
COUNT(DISTINCT A.id) table1,
COUNT(DISTINCT B.id) table2,
COUNT(DISTINCT C.id) table3,
COUNT(DISTINCT D.id) table4
FROM table1 A
LEFT JOIN table2 B
ON A.id = B.id

LEFT JOIN table3 C
ON A.id = C.id

LEFT JOIN table4 D
ON A.id = D.id

db fiddle(这个fiddle是针对mysql的,我正在寻找一种基于sql的通用方法,而不是任何特定于db的方法)

lx0bsm1f

lx0bsm1f1#

使用 UNION ALL 演示

select 'table1' as col1,count(table1.id),count(table2.id),count(table3.id),count(table4.id) 
from table1
left join table2 on table1.id=table2.id
left join table3 on table1.id=table3.id
left join table4 on table1.id=table4.id
union all
select 'table2' ,count(table1.id),count(table2.id),count(table3.id),count(table4.id) 
from table2
left join table1 on table2.id=table1.id
left join table3 on table2.id=table3.id
left join table4 on table2.id=table4.id
union all
select 'table3' ,count(table1.id),count(table2.id),count(table3.id),count(table4.id) 
from table3
left join table1 on table3.id=table1.id
left join table2 on table3.id=table2.id
left join table4 on table3.id=table4.id
union all
select 'table4' ,count(table1.id),count(table2.id),count(table3.id),count(table4.id) 
from table4
left join table1 on table4.id=table1.id
left join table2 on table4.id=table2.id
left join table3 on table4.id=table3.id

输出:

col1    tbl1    tbl2    tbl3    tbl4
table1   8      3        2       2
table2   3      6        1       0
table3   2      1        5       0
table4   2      0        0       4
flvlnr44

flvlnr442#

我建议:

with t as (
      select 'table1' as which, id from table1 union all
      select 'table2' as which, id from table2 union all
      select 'table3' as which, id from table3 union all
      select 'table4' as which, id from table4
     )
select ta.which,
       sum(case when tb.which = 'table1' then 1 else 0 end) as cnt_table1,
       sum(case when tb.which = 'table2' then 1 else 0 end) as cnt_table2,
       sum(case when tb.which = 'table3' then 1 else 0 end) as cnt_table3,
       sum(case when tb.which = 'table4' then 1 else 0 end) as cnt_table4
from t ta left join
     t tb
     on ta.id = tb.id
group by ta.which;

注:假设 id 在每个表中都是唯一的。考虑到列的名称和样本数据,这是一个合理的假设。但是,如果存在重复项,则可以更改 union all 在cte中 union .
这种结构也很容易推广到其他表。

相关问题