我们能把以：：分隔的文本文件加载到配置单元表中吗？

olqngx59 于 2021-06-27 发布在 Hive

关注(0)|答案(1)|浏览(298)

有没有办法将字段之间用“：”分隔的简单文本文件加载到配置单元表中，而不是将“：”替换为“，”然后再加载？当文本文件很小时，将“：”替换为“，”会更快，但是如果包含数百万条记录呢？

Hive

来源：https://stackoverflow.com/questions/53195963/can-we-load-text-file-separated-by-into-hive-table

1条答案

按热度按时间

im9ewurl1#

尝试使用regex serde创建配置单元表
例子：
我有一个文件，里面有下面的文字。

i::90
w::99

创建配置单元表：

hive> create external table default.i
(Id STRING,
Name STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' 
WITH SERDEPROPERTIES ('input.regex' = '(.*?)::(.*)')
STORED AS TEXTFILE;

从配置单元表中选择：

hive> select * from i;
+-------+---------+--+
| i.id  | i.name  |
+-------+---------+--+
| i     | 90      |
| w     | 99      |
+-------+---------+--+

如果要跳过标题，请使用以下语法：

hive> create external table default.i
(Id STRING,
Name STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' 
WITH SERDEPROPERTIES ('input.regex' = '(.*?)::(.*)')
STORED AS TEXTFILE
tblproperties ('skip.header.line.count'='1');

更新：
检查一下有没有 older files 在您的表位置。如果存在某些文件，则删除它们 (if you don't want them) .
1.将配置单元表创建为：

create external table <db_name>.<table_name>
(col1 STRING,
col2 STRING,
col3 string,
col4 string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' 
WITH SERDEPROPERTIES ('input.regex' = '(.*?)::(.*?)::(.*?)::(.*)')
STORED AS TEXTFILE;

2.然后运行：

load data local inpath 'Source path' overwrite into table 'Destination table'

赞(0）回复(0）举报 2021-06-27

我来回答

我们能把以：：分隔的文本文件加载到配置单元表中吗？

1条答案

相关问题

热门标签

最新问答