无法创建配置单元唯一分区

w8ntj3qf 于 2021-05-27 发布在 Hadoop

关注(0)|答案(1)|浏览(322)

我无法创建唯一分区。当我上传数据时，它一次又一次地创建所有的日期作为分区，甚至日期都是一样的

create table product_order1(id int,user_id int,amount int,product string, city string, txn_date string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

正常时间：0.133秒

LOAD DATA LOCAL INPATH 'txn' INTO TABLE product_order1;
    Loading data to table oct19.product_order1
    Table oct19.product_order1 stats: [numFiles=1, totalSize=303]
OK

所用时间：0.426秒

hive> 
    > set hive.exec.dynamic.partition = true;
    hive> 
    > set hive.exec.dynamic.partition.mode = true;

    hive> 
    > create table dyn_part(id int,user_id int,amount int,product string,city string) PARTITIONED BY(txn_date string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
OK

所用时间：0.14秒

hive >
INSERT OVERWRITE TABLE dyn_part PARTITION(txn_date) select id,user_id,amount,product,city,txn_date from product_order1;

我收到的结果：

Loading data to table oct19.dyn_part partition (txn_date=null)
     Time taken for load dynamic partitions : 944
    Loading partition {txn_date=04-02-2015}
    Loading partition {txn_date= 03-04-2015}
    Loading partition {txn_date=01-02-2015}
    Loading partition {txn_date=03-04-2015}
    Loading partition {txn_date= 01-01-2015}
    Loading partition {txn_date=01-01-2015}
    Loading partition {txn_date= 01-02-2015}
     Time taken for adding to write entity : 5
Partition oct19.dyn_part{txn_date= 01-01-2015} stats: [numFiles=1, numRows=1, totalSize=25, rawDataSize=24]
Partition oct19.dyn_part{txn_date= 01-02-2015} stats: [numFiles=1, numRows=1, totalSize=25, rawDataSize=24]
Partition oct19.dyn_part{txn_date= 03-04-2015} stats: [numFiles=1, numRows=2, totalSize=50, rawDataSize=48]
Partition oct19.dyn_part{txn_date=01-01-2015} stats: [numFiles=1, numRows=1, totalSize=26, rawDataSize=25]
Partition oct19.dyn_part{txn_date=01-02-2015} stats: [numFiles=1, numRows=1, totalSize=26, rawDataSize=25]
Partition oct19.dyn_part{txn_date=03-04-2015} stats: [numFiles=1, numRows=1, totalSize=26, rawDataSize=25]
Partition oct19.dyn_part{txn_date=04-02-2015} stats: [numFiles=1, numRows=1, totalSize=25, rawDataSize=24]
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 4.03 sec   HDFS Read: 4166 HDFS Write: 614 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 30 msec

hadoop Hive Date hive-partitions

来源：https://stackoverflow.com/questions/58255001/unable-to-create-hive-unique-paritions

1条答案

按热度按时间

ttcibm8c1#

我注意到有些日期包含空格，有些没有空格： txn_date= 03-04-2015 以及 txn_date=03-04-2015 尝试添加 trim :

INSERT OVERWRITE TABLE dyn_part PARTITION(txn_date) 
select id, user_id, amount, product, city, trim(txn_date) as txn_date 
from product_order1;

最好使用与配置单元兼容的日期格式 yyyy-MM-dd ，它是可排序的。
要同时格式化日期和删除空格，可以使用regexp\u replace。如果您当前的格式是 MM-dd-yyyy ，则可以将其格式化为：

select regexp_replace(' 03-04-2015','.*?(\\d{2})-(\\d{2})-(\\d{4})','$3-$1-$2') --fix accordingly if it is dd-MM-yyyy. In this case it should be '$3-$2-$1' in the replacement template.

退货：

2015-03-04

或者像这样加载：

INSERT OVERWRITE TABLE dyn_part PARTITION(txn_date) 
select id, user_id, amount, product, city, 
       regexp_replace(txn_date,'.*?(\\d{2})-(\\d{2})-(\\d{4})','$3-$1-$2') as txn_date 
  from product_order1;

regexp是指： '.*? -任何字符零次或多次 (\\d{2}) -第一组2位数字，替换为
$1 - 破折号 (\\d{2}) -第二组2位数字，替换为
$2 - 破折号 (\\d{4}) -第三组4位数字，替换为 $3 作为替代品 '$3-$1-$2' 我们按照正确的顺序从regexp中获取组，并用破折号分隔。假设3美元是年，1美元是月，2美元是日。你按正确的顺序分组 yyyy-MM-dd 因为无法理解您使用的是哪种格式： MM-dd-yyyy 或者 dd-MM-yyyy

赞(0）回复(0）举报 2021-05-27

我来回答

无法创建配置单元唯一分区

1条答案

相关问题

热门标签

最新问答