我有一个数据集如下,
John Doe^A100000.0^AMary Smith^BTodd Jones^AFederal Taxes^C.2^BState Taxes^C.05^BInsurance^C.1^A1 Michigan Ave.^BChicago^BIL^B60600
Mary Smith^A80000.0^ABill King^AFederal Taxes^C.2^BState Taxes^C.05^BInsurance^C.1^A100 Ontario St.^BChicago^BIL^B60601 Todd Jones^A70000.0^AFederal Taxes^C.15^BState Taxes^C.03^BInsurance^C.1^A200 Chicago Ave.^BOak Park^BIL^B60700
Bill King^A60000.0^AFederal Taxes^C.15^BState Taxes^C.03^BInsurance^C.1^A300 Obscure Dr.^BObscuria^BIL^B60100
我已经阅读了关于“数据值的文本编码”的hive文档,其中说明了hive如何解码具有不同“分隔符”的数据集。对于上面的数据集,我创建了一个具有以下模式的表,
CREATE TABLE employees
(
name STRING,
salary FLOAT,
subordinates ARRAY<STRING>,
deductions MAP<STRING, FLOAT>,
address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'
COLLECTION ITEMS TERMINATED BY '\002'
MAP KEYS TERMINATED BY '\003'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
但是数据集没有正确导入到表中。
|employees.name|employees.salary|employees.subordinates|employees.deductions|employees.address|
|--------------|----------------|----------------------|--------------------|-----------------|
|John Doe|NULL|["AMary Smith"]|{"BTodd Jones":null}|("street":"AFederal Taxes","city":null,"state":null,"zip":null}|
|Mary Smith|NULL|["ABill King"]|{"AFederal Taxes":null}|{"street":"C.2","city":null,"state":null,"zip":null}|
|Bill King|NULL|["AFederal Taxes"]|{"C.15":null}|{"street":"BState Taxes","city":null,"state":null,"zip":null} |
有谁能解释一下为什么这是错误的,虽然我下面的文件例子在第45页?提前谢谢。
暂无答案!
目前还没有任何答案,快来回答吧!