hive写入分区和默认值

jobtbby3  于 2021-05-29  发布在  Hadoop
关注(0)|答案(2)|浏览(472)

我有一个日期时间表,事实表和一个分区表。目标是将date\u time连接到事实并插入到分区表中。我可以证实它在工作 /apps/hive/warehouse/dbname.db/p_tbl/p_year=2016/p_month=01/p_day=01 以及其他几个年,月,日文件夹存在。不过我也有 p_year=__HIVE_DEFAULT_PARTITION__/p_month=__HIVE_DEFAULT_PARTITION__/p_day=__HIVE_DEFAULT_PARTITION__ 充满了应该有自己分区的数据。为什么某些年、月、日分区在工作,而其他日期则在工作 HIVE_DEFAULT ? 是的,我为我的帖子改名为generic,而不是真正的db,tbl,col名称。。。

CREATE EXTERNAL TABLE ${DB_NAME_HIVE}.date_time(
date_time_key bigint,
label_yyyy varchar(32),
label_mm varchar(32),
label_dd varchar(32)
)
row format delimited
fields terminated by ','
lines terminated by '\n'
LOCATION '${TARGET_PATH}date_time'
;

CREATE EXTERNAL TABLE ${DB_NAME_HIVE}.p_table(
date_time_key bigint,
column_0 varchar(50),
column_1 char(32),
column_2 timestamp,
column_3 bigint,
column_4 tinyint
)
PARTITIONED BY(p_year string, p_month string, p_day string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS ORC
LOCATION '${TARGET_PATH}my_table'
;

USE ${DB_NAME_HIVE};
INSERT INTO TABLE ${DB_NAME_HIVE}.p_tbl PARTITION(p_year, p_month, p_day)
SELECT
mt.date_time_key,
mt.col_0,
mt.col_1,
mt.col_2,
mt.col_3,
mt.col_4,
dt.label_yyyy,
dt.label_mm,
dt.label_dd
FROM my_tbl mt
LEFT OUTER JOIN date_time dt on (mt.date_time_key = dt.date_time_key)
;

INFO  :          Time taken for load dynamic partitions : 12557
INFO  :         Loading partition {p_year=2016, p_month=11, p_day=15}
INFO  :         Loading partition {p_year=2016, p_month=07, p_day=20}
INFO  :         Loading partition {p_year=2014, p_month=08, p_day=06}
INFO  :         Loading partition {p_year=2015, p_month=04, p_day=05}
INFO  :         Loading partition {p_year=2012, p_month=10, p_day=31}
INFO  :         Loading partition {p_year=2014, p_month=04, p_day=16}
INFO  :         Loading partition {p_year=2014, p_month=09, p_day=06}
INFO  :         Loading partition {p_year=2014, p_month=09, p_day=09}
INFO  :         Loading partition {p_year=2012, p_month=11, p_day=22}
INFO  :         Loading partition {p_year=2014, p_month=11, p_day=19}
INFO  :         Loading partition {p_year=2014, p_month=09, p_day=03}
INFO  :         Loading partition {p_year=2013, p_month=12, p_day=24}
INFO  :         Loading partition {p_year=2014, p_month=10, p_day=29}
INFO  :         Loading partition {p_year=2015, p_month=09, p_day=04}
INFO  :         Loading partition {p_year=2015, p_month=05, p_day=11}
INFO  :         Loading partition {p_year=2016, p_month=06, p_day=13}
INFO  :         Loading partition {p_year=2014, p_month=09, p_day=24}
INFO  :         Loading partition {p_year=2014, p_month=10, p_day=21}
INFO  :         Loading partition {p_year=2016, p_month=01, p_day=06}
INFO  :         Loading partition {p_year=2014, p_month=11, p_day=05}
INFO  :         Loading partition {p_year=2012, p_month=12, p_day=04}
INFO  :         Loading partition {p_year=2016, p_month=11, p_day=25}
INFO  :         Loading partition {p_year=2014, p_month=10, p_day=13}
INFO  :         Loading partition {p_year=2013, p_month=06, p_day=21}
INFO  :         Loading partition {p_year=2013, p_month=06, p_day=27}
INFO  :         Loading partition {p_year=2014, p_month=10, p_day=22}
INFO  :         Loading partition {p_year=2016, p_month=11, p_day=22}
INFO  :         Loading partition {p_year=2012, p_month=10, p_day=26}
INFO  :         Loading partition {p_year=2014, p_month=08, p_day=28}
INFO  :         Loading partition {p_year=2013, p_month=10, p_day=21}
INFO  :         Loading partition {p_year=2014, p_month=09, p_day=04}
INFO  :         Loading partition {p_year=2013, p_month=11, p_day=14}
INFO  :         Loading partition {p_year=2013, p_month=10, p_day=22}
INFO  :         Loading partition {p_year=2014, p_month=08, p_day=26}
INFO  :         Loading partition {p_year=2012, p_month=10, p_day=30}
INFO  :         Loading partition {p_year=2014, p_month=05, p_day=23}
INFO  :         Loading partition {p_year=2012, p_month=10, p_day=29}
INFO  :         Loading partition {p_year=2014, p_month=09, p_day=02}
INFO  :         Loading partition {p_year=__HIVE_DEFAULT_PARTITION__, p_month=__HIVE_DEFAULT_PARTITION__, p_day=__HIVE_DEFAULT_PARTITION__}
INFO  :         Loading partition {p_year=2014, p_month=09, p_day=05}
INFO  :         Loading partition {p_year=2016, p_month=07, p_day=25}
INFO  :         Loading partition {p_year=2016, p_month=08, p_day=10}
efzxgjgh

efzxgjgh1#

此问题已通过使用 DATE_TIME 表和使用内置 HIVE 功能 YEAR();MONTH();DAY(); 反对 TIMESTAMP 数据中的列。这些函数返回 INT 传递给分区。

示例

CREATE EXTERNAL TABLE ${DB_NAME_HIVE}.my_tbl(
unique_id char(32),
some_name varchar(50),
some_group varchar(50),
in_time_start timestamp,
in_time_finish timestamp
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
LOCATION '${TARGET_PATH}my_tbl';

CREATE EXTERNAL TABLE ${DB_NAME_HIVE}.my_tbl_p(
unique_id char(32),
some_name varchar(50),
some_group varchar(50),
in_time_start timestamp,
in_time_finish timestamp
)
PARTITIONED BY(p_year int, p_month int, p_day int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS ORC
LOCATION '${TARGET_PATH}my_tbl_p'
;

INSERT INTO TABLE ${DB_NAME_HIVE}.my_tbl_p PARTITION(p_year, p_month, p_day)
SELECT
unique_id,
some_name,
some_group,
in_time_start,
in_time_finish,
year(in_time_start) as p_year,
month(in_time_start) as p_month,
day(in_time_start) as p_day
FROM my_tbl
;
iyfjxgzm

iyfjxgzm2#

对于配置单元中的动态分区,如果存在空值,则将它们写入配置单元默认分区。因此,为了解决这个问题,您可以清理源数据来填充空值或删除它们。

相关问题