有没有办法用一个analyze命令计算所有分区的配置单元表的统计信息？

fsi0uk1n 于 2021-06-03 发布在 Hadoop

关注(0)|答案(3)|浏览(478)

我在hive中看到的计算统计信息的语法似乎表明标题问题的答案是“否”：

ANALYZE TABLE [TABLENAME] PARTITION(parcol1=…, partcol2=….) COMPUTE STATISTICS

然而，我想在这里抛出它，因为我很惊讶总是需要编写一个脚本来迭代分区以生成每个分区的语句。我们现在在这个小表上有大约1000个分区，它将以数量级的速度增长。
顺便说一句，我在没有指定分区的情况下尝试了以下操作：

hive> analyze table metrics compute statistics;
FAILED: SemanticException [Error 10115]: Table is partitioned and partition specification is needed

hadoop Hive table-statistics

来源：https://stackoverflow.com/questions/18515938/any-way-to-compute-statistics-on-a-hive-table-for-all-partitions-with-a-single-a

3条答案

按热度按时间

ddhy6vgd1#

我使用的是最新的hive1.2，下面的命令运行得非常好

hive> analyze table member partition(day) compute statistics noscan;
Partition mobi_mysql.member{day=20150831} stats: [numFiles=7, numRows=-1, totalSize=4735943322, rawDataSize=-1]
Partition mobi_mysql.member{day=20150901} stats: [numFiles=7, numRows=117512, totalSize=19741804, rawDataSize=0]
Partition mobi_mysql.member{day=20150902} stats: [numFiles=7, numRows=-1, totalSize=17734601, rawDataSize=-1]
Partition mobi_mysql.member{day=20150903} stats: [numFiles=7, numRows=-1, totalSize=13091084, rawDataSize=-1]
OK
Time taken: 2.089 seconds

赞(0）回复(0）举报 2021-06-03

hi3rlvi22#

根据配置单元手册，如果未指定分区规格，则会为整个表收集统计信息，https://cwiki.apache.org/confluence/display/hive/statsdev

When the user issues that command, he may or may not specify the partition specs. If the user doesn't specify any partition specs, statistics are gathered for the table as well as all the partitions (if any).

赞(0）回复(0）举报 2021-06-03

zbdgwd5y3#

是的，你可以。
至少从我所在的Hivev0.13开始。只需尝试没有特定值的分区规范语法（没有 =… 位）
如果您使用for列，则由于存在以下错误，您不能：https://issues.apache.org/jira/browse/hive-4861

赞(0）回复(0）举报 2021-06-03