如何根据计数比较配置单元中的两个表

64jmpszr  于 2021-06-26  发布在  Hive
关注(0)|答案(3)|浏览(262)

我在 hive 下面有table

Table_1
ID
1
1
2

Table_2
ID
1
2
2

我比较两个表的基础上计数的id在这两个表,我需要如下输出

ID 
1 - 2records in table 1 and 1 record in Table 2
2 - one record in Table 1 and 2 records in table 2

表1是父表
我正在使用下面的查询

select count(*),ID from Table_1 group by ID;
select count(*),ID from Table_2 group by ID;
watbbzwu

watbbzwu1#

只需对查询执行一个完整的外部联接,条件为x.id=y.id,然后从结果表中选择*来检查两边的空值。

Select id, concat(cnt1, " entries in table 1, ",cnt2, "entries in table 2") from (select * from (select count(*) as cnt1, id from table1 group by id) X full outer join (select count(*) as cnt2, id from table2 group by id)
on X.id=Y.id
)
iih3973s

iih3973s2#

您可以使用此python程序对两个配置单元表进行完全比较:https://github.com/bolcom/hive_compared_bq
如果您希望仅基于计数进行快速比较,则传递“-just count”选项(您还可以使用“-group by column”指定groupby column)。
如果希望进行完整的验证,该脚本还允许您直观地查看所有行和列上的所有差异。

bn31dyow

bn31dyow3#

试试这个。您可以使用case语句来检查它是否应该是record/records等。

SELECT m.id,
       CONCAT (COALESCE(a.ct, 0), ' record in table 1, ', COALESCE(b.ct, 0),
       ' record in table 2')
FROM   (SELECT  id
        FROM   table_1
        UNION
        SELECT id
        FROM   table_2) m
       LEFT JOIN (SELECT Count(*) AS ct,
                         id
                  FROM   table_1
                  GROUP  BY id) a
              ON m.id = a.id
       LEFT JOIN (SELECT Count(*) AS ct,
                         id
                  FROM   table_2
                  GROUP  BY id) b
              ON m.id = b.id;

相关问题