无法在配置单元分区表中加载数据

ktca8awb  于 2021-05-29  发布在  Hadoop
关注(0)|答案(2)|浏览(375)

我已使用以下查询在配置单元中创建了一个表:

create table if not exists employee(CASE_NUMBER String,
                                         CASE_STATUS String,
                                         CASE_RECEIVED_DATE DATE,
                                         DECISION_DATE  DATE,
                                         EMPLOYER_NAME STRING,
                                         PREVAILING_WAGE_PER_YEAR BIGINT,
                                         PAID_WAGE_PER_YEAR BIGINT,
                                         order_n int) partitioned by (JOB_TITLE_SUBGROUP STRING) row format delimited fields terminated by ',';

我尝试使用以下查询将数据加载到create表中:

LOAD DATA INPATH '/salary_data.csv' overwrite into table employee  partition (JOB_TITLE_SUBGROUP);

对于分区表,我甚至设置了以下配置:

set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;

但在执行加载查询时出现以下错误:
您的查询有以下错误:
编译语句时出错:failed:semanticexception org.apache.hadoop.hive.ql.metadata.hivexception:metaexception(message:invalid partition 关键字和值;键[job\u title\u subgroup,],值[])
请帮忙。

nqwrtyyt

nqwrtyyt1#

如果要将数据加载到配置单元分区中,则必须在 LOAD DATA 查询。所以在本例中,您的查询是这样的。

LOAD DATA INPATH '/salary_data.csv' overwrite into table employee partition (JOB_TITLE_SUBGROUP="Value");

其中“value”是加载数据的分区的名称。原因是,配置单元将使用“值”来创建存储.csv的目录,该目录如下所示: .../employee/JOB_TITLE_SUBGROUP=Value . 我希望这有帮助。
查看文档以了解 LOAD DATA 语法。
编辑
由于表具有动态分区,一种解决方案是将.csv加载到外部表(例如employee\u external)中,然后执行 INSERT 命令如下:

INSERT OVERWRITE INTO TABLE employee PARTITION(JOB_TITLE_SUBGROUP)
SELECT CASE_NUMBER, CASE_STATUS, (...), JOB_TITLE_SUBGROUP
FROM employee_external
68bkxrlz

68bkxrlz2#

我可能会晚一点回复,但可以尝试以下步骤:
首先设置以下属性:

Ø set hive.exec.dynamic.partition.mode=nonstrict;
Ø set hive.exec.dynamic.partition=true;

先创建临时表:

CREATE EXTERNAL TABLE IF NOT EXISTS employee_temp(
ID STRING,
Name STRING,
Salary STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
tblproperties ("skip.header.line.count"="1");

在临时表中加载数据:

hive> LOAD DATA INPATH 'filepath/employee.csv' OVERWRITE INTO TABLE employee;

创建分区表:

CREATE EXTERNAL TABLE IF NOT EXISTS employee_part(
ID STRING,
Name STRING)
PARTITIONED BY (Salary STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
tblproperties ("skip.header.line.count"="1");

从中间/临时表将数据加载到分区表:

INSERT OVERWRITE TABLE employee_part PARTITION (SALARY) SELECT * FROM employee;

相关问题