clickhouse mergetree慢速选择和订购方式

gxwragnw  于 2021-07-15  发布在  ClickHouse
关注(0)|答案(2)|浏览(556)

我开始学习ch,似乎在试图提高查询速度时遇到了死胡同,表是这样创建的

CREATE TABLE default.stats(
  aa String, 
  ab String, 
  user_id UInt16, 
  ac UInt32,  
  ad UInt8, 
  ae UInt8, 
  created_time DateTime, 
  created_date Date, 
  af UInt8, 
  ag UInt32, 
  ah UInt32, 
  ai String, 
  aj String) 
ENGINE = MergeTree 
PARTITION BY toYYYYMM(created_time) 
ORDER BY(created_time, user_id)

我正在进行这样的查询

SELECT ad, created_time, ab, aa, user_id, ac, ag, af 
FROM stats 
WHERE user_id = 1 AND lowerUTF8(ab) = 'xxxxxxxxx' AND ad != 12 
ORDER BY created_time DESC 
LIMIT 50 OFFSET 0

这是集合中50行的结果。已用时间:2.881秒。处理7462万行
如果我运行相同的查询,但不包含order部分,那么集合中有50行。已用时间:0.020秒。处理4.915万行
如果理论上查询只需对10k左右的行(返回的所有行没有限制)排序,为什么它似乎要处理表中的所有行?我缺少什么和/或如何提高ch的速度?

fnvucqvd

fnvucqvd1#

按创建时间描述、用户id尝试订购
在clickhouse 19.14.3.32019-09-10版本中实现了按顺序优化读取功能

juud5qan

juud5qan2#

CH 19.17.4.11
CREATE TABLE stats
(
    `aa` String,
    `ab` String,
    `user_id` UInt16,
    `ac` UInt32,
    `ad` UInt8,
    `ae` UInt8,
    `created_time` DateTime,
    `created_date` Date,
    `af` UInt8,
    `ag` UInt32,
    `ah` UInt32,
    `ai` String,
    `aj` String
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(created_time)
ORDER BY (created_time, user_id)

insert into stats(created_time, user_id) select toDateTime(intDiv(number,100)), number%103 from numbers(100000000)

SELECT ad, created_time, ab, aa, user_id, ac, ag, af
FROM stats
ORDER BY created_time DESC
LIMIT 5 OFFSET 0

5 rows in set. Elapsed: 0.013 sec. Processed 835.84 thousand rows,

set  optimize_read_in_order = 0

SELECT ad, created_time, ab, aa, user_id, ac, ag, af
FROM stats
ORDER BY created_time DESC
LIMIT 5 OFFSET 0

5 rows in set. Elapsed: 0.263 sec. Processed 100.00 million rows

检查差异集optimize\u read \u in \u order=0与set optimize\u read \u in \u order=1
我不明白为什么优化按顺序阅读在你的情况下不起作用。

相关问题