在impala中执行sql时,我得到以下信息:
无法处理大于io大小的行(行大小=13.42 mb,空指示符大小=0)。要运行此查询,请增加io大小(--read\u size选项)。
解释如下:
06:SORT
| order by: count(*) DESC
| hosts=1 per-host-mem=unavailable
| tuple-ids=7 row-size=24B cardinality=30000000
|
05:AGGREGATE [FINALIZE]
| output: count(*)
| group by: group_concat(host)
| having: count(*) > 10
| hosts=1 per-host-mem=unavailable
| tuple-ids=6 row-size=24B cardinality=30000000
|
04:AGGREGATE [FINALIZE]
| output: group_concat(host)
| group by: gridsum_id
| hosts=1 per-host-mem=unavailable
| tuple-ids=4 row-size=31B cardinality=30000000
|
08:MERGING-EXCHANGE [UNPARTITIONED]
| order by: g_id ASC, server_time ASC, session_order ASC
| limit: 30000000
| hosts=1 per-host-mem=unavailable
| tuple-ids=2 row-size=46B cardinality=30000000
|
03:TOP-N [LIMIT=30000000]
| order by: g_id ASC, server_time ASC, session_order ASC
| hosts=1 per-host-mem=1.29GB
| tuple-ids=2 row-size=46B cardinality=30000000
|
02:HASH JOIN [INNER JOIN, BROADCAST]
| hash predicates: b.g_id = r.g_id
| runtime filters: RF000 <- r.g_id
| hosts=1 per-host-mem=2.00GB
| tuple-ids=1,0 row-size=65B cardinality=unavailable
|
|--07:EXCHANGE [BROADCAST]
| | hosts=18 per-host-mem=0B
| | tuple-ids=0 row-size=46B cardinality=unavailable
| |
| 00:SCAN HDFS [u_g.botao_route_all r, RANDOM]
| partitions=1/1 files=18 size=213.24MB
| predicates: r.host NOT IN ('-', '(lost)'), r.session_order > 0
| table stats: unavailable
| column stats: unavailable
| hosts=18 per-host-mem=96.00MB
| tuple-ids=0 row-size=46B cardinality=unavailable
|
01:SCAN HDFS [u_g.botao_id b, RANDOM]
partitions=1/1 files=1 size=5.53MB
predicates: b.profile_id = 2473
runtime filters: RF000 -> b.g_id
table stats: 160891 rows total
column stats: unavailable
hosts=1 per-host-mem=32.00MB
tuple-ids=1 row-size=19B cardinality=16089
----------------
任何人都可以帮助我,非常感谢。
1条答案
按热度按时间jtw3ybtb1#
因为内存不足和溢出io缓冲区大小有限,所以出现这种情况。
当溢出发生时,impala需要一次逐行写入中间元组,这要求io缓冲区足够大,至少可以容纳一行。在您的例子中,这个条件不满足,导致前面提到的错误。
您可以使用较大的内存运行查询,也可以通过
--read_size
选项,但在这种情况下这是违反直觉的。