如果配置单元表中存在多个分区,则spark sql(通过hivecontext的配置单元查询)insert overwrite不会覆盖现有数据

aiazj4mn  于 2021-06-26  发布在  Hive
关注(0)|答案(1)|浏览(345)

//hive-1.2.1000.2.6.1.0-129我们正在尝试用多个分区插入overwrite test5表(https://cwiki.apache.org/confluence/display/hive/languagemanual+dml)插入覆盖将覆盖表或分区中的任何现有数据。但在insert overwrite查询被触发之后,我们仍然得到一些旧数据。下面是示例执行和输出。
//spark-2.1.1在spark-2.1.1中运行hivecontext时,我们得到了相同的输出

CREATE TABLE dbtest.test5 (emp_id INT) PARTITIONED BY (depart_id INT,depart_name STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION 'externalpath'; 

INSERT INTO TABLE dbtest.test5  PARTITION (depart_id,depart_name) SELECT emp_id,depart_id,depart_name from dbtest.tempTableHive1; 

4       123     Dev 
5       123     Dev 
6       123     Test 
7       567     Test 

INSERT INTO TABLE dbtest.test5  PARTITION (depart_id,depart_name) SELECT emp_id,depart_id,depart_name from dbtest.tempTableHive2; 
4       123     Dev 
5       123     Dev 
1       123     Dev 
2       123     Dev 
6       123     Test 
3       123     Test 
7       567     Test 

INSERT OVERWRITE TABLE dbtest.test5  PARTITION (depart_id,depart_name) SELECT emp_id,depart_id,depart_name from dbtest.tempTableHive3; 

8       123     Dev 
9       123     Dev 
10      123     Dev 
6       123     Test 
3       123     Test 
7       567     Test

是否有任何错误的代码或它是apache配置单元的问题?

eit6fx6z

eit6fx6z1#

当您指定insert overwrite时,配置单元将覆盖分区。请参阅下面我从cloudera快速启动vm的输出。

hive> SELECT * FROM tempTableHive1;
OK
4   123 Dev
5   567 Test
Time taken: 0.048 seconds, Fetched: 2 row(s)
hive> INSERT INTO TABLE test5  PARTITION (depart_id,depart_name) SELECT emp_id,depart_id,depart_name from tempTableHive1; 

hive> SELECT * FROM test5;
OK
4   123 Dev
5   567 Test
Time taken: 0.065 seconds, Fetched: 2 row(s)

hive> SELECT * FROM tempTableHive2;
OK
4   123 Dev
6   123 Dev
Time taken: 0.047 seconds, Fetched: 2 row(s)

hive> INSERT INTO TABLE test5  PARTITION (depart_id,depart_name) 
    > SELECT emp_id,depart_id,depart_name from tempTableHive2; 

hive> SELECT * FROM test5;
OK
4   123 Dev
4   123 Dev
6   123 Dev
5   567 Test
Time taken: 0.057 seconds, Fetched: 4 row(s)

hive> SELECT * FROM tempTableHive3;
OK
100 123 Dev
101 123 Dev

hive> INSERT OVERWRITE TABLE test5  PARTITION (depart_id,depart_name) 
    > SELECT emp_id,depart_id,depart_name from tempTableHive3;

hive> SELECT * FROM test5;
OK
100 123 Dev
101 123 Dev
5   567 Test
Time taken: 0.072 seconds, Fetched: 3 row(s)

如果仍然有问题,最好的调试方法是检查hdfs文件。每个部门id/部门名称组合应有一个文件。example/user/hive/warehouse/test5/depart\u id=123/depart\u name=dev。由于它们是文本文件,您可以快速“cat”它们以查看内容。让我们知道你进展如何。

相关问题