无法处理大于io大小的行

fjnneemd  于 2021-06-26  发布在  Impala
关注(0)|答案(1)|浏览(410)

在impala中执行sql时,我得到以下信息:
无法处理大于io大小的行(行大小=13.42 mb,空指示符大小=0)。要运行此查询,请增加io大小(--read\u size选项)。
解释如下:

06:SORT
|  order by: count(*) DESC
|  hosts=1 per-host-mem=unavailable
|  tuple-ids=7 row-size=24B cardinality=30000000
|
05:AGGREGATE [FINALIZE]
|  output: count(*)
|  group by: group_concat(host)
|  having: count(*) > 10
|  hosts=1 per-host-mem=unavailable
|  tuple-ids=6 row-size=24B cardinality=30000000
|
04:AGGREGATE [FINALIZE]
|  output: group_concat(host)
|  group by: gridsum_id
|  hosts=1 per-host-mem=unavailable
|  tuple-ids=4 row-size=31B cardinality=30000000
|
08:MERGING-EXCHANGE [UNPARTITIONED]
|  order by: g_id ASC, server_time ASC, session_order ASC
|  limit: 30000000
|  hosts=1 per-host-mem=unavailable
|  tuple-ids=2 row-size=46B cardinality=30000000
|
03:TOP-N [LIMIT=30000000]
|  order by: g_id ASC, server_time ASC, session_order ASC
|  hosts=1 per-host-mem=1.29GB
|  tuple-ids=2 row-size=46B cardinality=30000000
|
02:HASH JOIN [INNER JOIN, BROADCAST]
|  hash predicates: b.g_id = r.g_id
|  runtime filters: RF000 <- r.g_id
|  hosts=1 per-host-mem=2.00GB
|  tuple-ids=1,0 row-size=65B cardinality=unavailable
|
|--07:EXCHANGE [BROADCAST]
|  |  hosts=18 per-host-mem=0B
|  |  tuple-ids=0 row-size=46B cardinality=unavailable
|  |
|  00:SCAN HDFS [u_g.botao_route_all r, RANDOM]
|     partitions=1/1 files=18 size=213.24MB
|     predicates: r.host NOT IN ('-', '(lost)'), r.session_order > 0
|     table stats: unavailable
|     column stats: unavailable
|     hosts=18 per-host-mem=96.00MB
|     tuple-ids=0 row-size=46B cardinality=unavailable
|
01:SCAN HDFS [u_g.botao_id b, RANDOM]
   partitions=1/1 files=1 size=5.53MB
   predicates: b.profile_id = 2473
   runtime filters: RF000 -> b.g_id
   table stats: 160891 rows total
   column stats: unavailable
   hosts=1 per-host-mem=32.00MB
   tuple-ids=1 row-size=19B cardinality=16089
----------------

任何人都可以帮助我,非常感谢。

jtw3ybtb

jtw3ybtb1#

因为内存不足和溢出io缓冲区大小有限,所以出现这种情况。

Status BufferedTupleStream::NewBlockForWrite(int min_size, bool* got_block) {
    DCHECK(!closed_);
    if (min_size > block_mgr_->max_block_size()) {
      return Status(Substitute("Cannot process row that is bigger than the IO size "
            "(row_size=$0). To run this query, increase the io size (--read_size option).",
            PrettyPrinter::Print(min_size, TCounterType::BYTES)));
}

当溢出发生时,impala需要一次逐行写入中间元组,这要求io缓冲区足够大,至少可以容纳一行。在您的例子中,这个条件不满足,导致前面提到的错误。
您可以使用较大的内存运行查询,也可以通过 --read_size 选项,但在这种情况下这是违反直觉的。

相关问题