我有一个具有以下架构的配置单元表:
CREATE EXTERNAL TABLE db_test.user_arry(
cstid string,
prdctsslctd array<string>,
indvprc array<bigint>,
dscntamt array<bigint>,
prdctsrjctd array<string>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'/location/on/a/hadoop/'
其中的数据格式如下:
--------------------------------------------------------
name | prdctsslctd | indvprc | dscntamt | prdctsrjctd
--------------------------------------------------------
cctg65 ["m_jns","cbyht"] ["23","6"] ["1","1"] ["shs","jkt"]
jju89o0 ["top","jeans_wmn"] ["55","45"] [NULL] [NULL]
ju34hd ["laychps","candy","toy"]["3","5","67"]["12","8"]["candy"]
尝试将此数据拉入数据类型为的表中 string
对于所有列
CREATE EXTERNAL TABLE db_test.user_strng(
cstid string,
prdctsslctd string,
indvprc string,
dscntamt string,
prdctsrjctd string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS textfile
LOCATION
'/location/on/a/hadoop/';
使用:
insert into db_test.user_strng select * from db_test.user_arry;
实际o/p:
--------------------------------------------------------
name | prdctsslctd | indvprc | dscntamt | prdctsrjctd
--------------------------------------------------------
cctg65 m_jnscbyht 236 11 shsjkt
jju89o0 topjeans_wmn 5545 NULL NULL
ju34hd laychpscandytoy 3567 128 candy
预期o/p:
--------------------------------------------------------
name | prdctsslctd | indvprc | dscntamt | prdctsrjctd
--------------------------------------------------------
cctg65 "m_jns","cbyht" "23","6" "1","1" "shs","jkt"
jju89o0 "top","jeans_wmn" "55","45" NULL NULL
ju34hd "laychps","candy","toy" "3","5","67" "12","8" "candy"
不知道哪里出了问题,或者错过了什么?
Update_1
执行数组到数组的转换后,表中的o/p:
ALTER TABLE user_arry CHANGE indvprc indvprc array<string>;
ALTER TABLE user_arry CHANGE dscntamt dscntamt array<string>;
--------------------------------------------------------
name | prdctsslctd | indvprc | dscntamt | prdctsrjctd
--------------------------------------------------------
cctg65 ["m_jns","cbyht"] ["23","6"] ["1","1"] ["shs","jkt"]
jju89o0 ["top","jeans_wmn"] ["55","45"] [] []
ju34hd ["laychps","candy","toy"]["3","5","67"]["12","8"]["candy"]
所有数据类型均为字符串的表中的最终o/p:
--------------------------------------------------------
name | prdctsslctd | indvprc | dscntamt | prdctsrjctd
--------------------------------------------------------
cctg65 m_jns cbyht 23 6 1 1 shs jkt
jju89o0 top jeans_wmn 55 45
ju34hd laychps candy toy 3 5 67 12 8 candy
仍然没有得到所需的o/p。
Update_2
正如所建议的那样,这一变化是否有助于 FIELDS TERMINATED BY ','
至 FIELDS TERMINATED BY '\t'
. 获取所需格式的数据。
1条答案
按热度按时间vmdwslir1#
将所有数组类型更改为
array <string>
:根据你的数据例子,
array<bigint>
对于以下值不是正确的格式"23.45"
.array<string>
对你的数据文件来说应该很好。使用
concat_ws
要将数组转换为逗号分隔的字符串,请执行以下操作: