为什么配置单元使用分区表下其他文件中的文件？

vlf7wbxs 于 2021-06-03 发布在 Hadoop

关注(0)|答案(1)|浏览(334)

我的房间里有一张简单的table Hive . 它只有一个分区：

show partitions hive_test;                       
OK
pt=20130805000000
Time taken: 0.124 seconds

但是当我执行一个简单的sql查询时，结果是在文件夹下找到了数据文件 20130805000000 . 为什么不直接用文件呢 20130805000000 ?
sql语句：

SELECT buyer_id AS USER_ID from hive_test limit 1;

这是个例外：

java.io.IOException: /group/myhive/test/hive/hive_test/pt=20130101000000/data
doesn't exist!
   at org.apache.hadoop.hdfs.DFSClient.listPathWithLocations(DFSClient.java:1045)
   at org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:352)
   at org.apache.hadoop.fs.viewfs.ChRootedFileSystem.listLocatedStatus(ChRootedFileSystem.java:270)
   at org.apache.hadoop.fs.viewfs.ViewFileSystem.listLocatedStatus(ViewFileSystem.java:851)
   at org.apache.hadoop.hdfs.Yunti3FileSystem.listLocatedStatus(Yunti3FileSystem.java:349)
   at org.apache.hadoop.mapred.SequenceFileInputFormat.listLocatedStatus(SequenceFileInputFormat.java:49)
   at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:242)
   at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:261)
   at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1238)

我的问题是，为什么hive要查找文件“/group/myhive/test/hive/hive\u test/pt=201301010000000/data”，而不是“/group/myhive/test/hive/hive\u test/pt=201301010000000/”？

sql hadoop Hive

来源：https://stackoverflow.com/questions/18668800/why-hive-use-file-from-other-files-under-the-partition-table

1条答案

按热度按时间

m528fe3b1#

您没有得到错误，因为您在配置单元表上创建了分区，但在select语句期间没有分配分区名称。
在hive的分区实现中，表中的数据被分割到多个分区中。每个分区对应于分区列的特定值，并作为子目录存储在hdfs上的表目录中。当查询表时（如果适用），只查询表中所需的分区。
请在select查询中提供分区名称，或按以下方式使用查询：

select buyer_id AS USER_ID from hive_test where pt='20130805000000' limit 1;

有关配置单元分区的详细信息，请参阅链接。

赞(0）回复(0）举报 2021-06-03

我来回答

为什么配置单元使用分区表下其他文件中的文件？

1条答案

相关问题

热门标签

最新问答