向主键添加列将删除唯一性

xpszyzbs  于 2021-06-26  发布在  Hive
关注(0)|答案(1)|浏览(354)

我有大约3400万行,每行中有23列 store_sales tpc ds数据集中的表。
我有一个 composite primary key 列数 ss_item_sk 以及 ss_ticket_number .
运行查询时 SELECT count(DISTINCT <primary key>) .. 我可以看到它输出表中的行总数。
现在我添加另一个列以及 primary key ,即 ss_sold_date_sk .
在这之后,如果我 count 查询时,我得到的打印行数比以前少了**。有人能举例向我解释一下为什么会这样吗?

热释光;博士

向复合主键添加列什么时候会停止使其唯一?

krugob8w

krugob8w1#

演示

create table mytable (c1 string,c2 string);
insert into mytable values ('A','A'),('B',null),('C',null);
select count(distinct c1) as count_distinct from mytable;

+----------------+
| count_distinct |
+----------------+
|              3 |
+----------------+

如预期-3个不同的值-“a”、“b”和“c”

select count(distinct concat(c1,c2)) as count_distinct from mytable;

+----------------+
| count_distinct |
+----------------+
|              1 |
+----------------+

一如预期。为什么?-请参阅下一个查询

select c1,c2,concat(c1,c2) as concat_c1_c2 from mytable;

+----+------+--------------+
| c1 |  c2  | concat_c1_c2 |
+----+------+--------------+
| A  | A    | AA           |
| B  | NULL | NULL         |
| C  | NULL | NULL         |
+----+------+--------------+

用null连接产生null

select count(distinct c1,c2) as count_distinct from mytable;

+----------------+
| count_distinct |
+----------------+
|              1 |
+----------------+

缺陷!!
下面是一个解决问题的方法:

select count(distinct struct(c1,c2)) as count_distinct from mytable;

+----------------+
| count_distinct |
+----------------+
|              3 |
+----------------+

相关问题