s3上的ParquetHive桌

0g0grzrc 于 2021-06-28 发布在 Hive

关注(0)|答案(1)|浏览(415)

我正在尝试（在s3上创建Parquet配置单元表失败）。

create external table sequencefile_s3
(user_id bigint, 
creation_dt string
)
stored as sequencefile location 's3a://bucket/sequencefile';

序列文件工作得很好。

create external table parquet_s3
(user_id bigint,
creation_dt string)
stored as parquet location 's3a://bucket/parquet';

insert into parquet_s3
select * from hdfs_data;

Parquet地板坏了。文件是在s3 bucket/文件夹上创建的，选择count（）有效，但是选择from parquet\u s3 limit 10无效。
其他注意事项：我在aws或ec2之外运行cloudera发行版5.8。s3a配置正确（我可以通过distcp复制文件，S3SequenceFile和textfile外部表可以完美地工作）。

Hive parquet amazon-s3 cloudera-cdh

来源：https://stackoverflow.com/questions/38791108/parquet-hive-table-on-s3

1条答案

按热度按时间

camsedfj1#

首先，你不清楚你的问题。。。
有什么问题？
此外，错误日志非常重要，运行时得到什么输出以及什么命令？
我现在只能说，hive有自己的sequencefile reader和sequencefile writer库，用于读取和写入序列文件。
它使用这些包中的sequencefile输入和输出格式：
org.apache.hadoop.mapred.sequencefileinputformat
org.apache.hadoop.hive.ql.io.hivesequencefileoutputformat
创建Parquet地板表时请使用下表属性语句，然后重试
tblproperty（“parquet.compress”=“snappy”）；

赞(0）回复(0）举报 2021-06-28

我来回答

s3上的ParquetHive桌

1条答案

相关问题

热门标签

最新问答