需要帮助优化order by和count查询,我有数百万(约300万)行的表。
我必须连接4个表并获取记录,当我运行简单的查询时,它只需要毫秒就可以完成,但是当我试图通过离开连接表来计数或排序时,它会被无限时间卡住。
请看下面的案例。
数据库服务器配置:
CPU Number of virtual cores: 4
Memory(RAM): 16 GiB
Network Performance: High
每个表中的行:
tbl_customers - #Rows: 20 million.
tbl_customers_address - #Row 25 million.
tbl_shop_setting - #Rows 50k
aio_customer_tracking - #Rows 5k
表架构:
CREATE TABLE `tbl_customers` (
`id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`shopify_customer_id` BIGINT(20) UNSIGNED NOT NULL,
`shop_id` BIGINT(20) UNSIGNED NOT NULL,
`email` VARCHAR(225) NULL DEFAULT NULL COLLATE 'latin1_swedish_ci',
`accepts_marketing` TINYINT(1) NULL DEFAULT NULL,
`first_name` VARCHAR(50) NULL DEFAULT NULL COLLATE 'latin1_swedish_ci',
`last_name` VARCHAR(50) NULL DEFAULT NULL COLLATE 'latin1_swedish_ci',
`last_order_id` BIGINT(20) NULL DEFAULT NULL,
`total_spent` DECIMAL(12,2) NULL DEFAULT NULL,
`phone` VARCHAR(20) NULL DEFAULT NULL COLLATE 'latin1_swedish_ci',
`verified_email` TINYINT(4) NULL DEFAULT NULL,
`updated_at` DATETIME NULL DEFAULT NULL,
`created_at` DATETIME NULL DEFAULT NULL,
`date_updated` DATETIME NULL DEFAULT NULL,
`date_created` DATETIME NULL DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE INDEX `shopify_customer_id_unique` (`shopify_customer_id`),
INDEX `email` (`email`),
INDEX `shopify_customer_id` (`shopify_customer_id`),
INDEX `shop_id` (`shop_id`)
)
COLLATE='utf8mb4_general_ci'
ENGINE=InnoDB;
CREATE TABLE `tbl_customers_address` (
`id` BIGINT(20) NOT NULL AUTO_INCREMENT,
`customer_id` BIGINT(20) NULL DEFAULT NULL,
`shopify_address_id` BIGINT(20) NULL DEFAULT NULL,
`shopify_customer_id` BIGINT(20) NULL DEFAULT NULL,
`first_name` VARCHAR(50) NULL DEFAULT NULL,
`last_name` VARCHAR(50) NULL DEFAULT NULL,
`company` VARCHAR(50) NULL DEFAULT NULL,
`address1` VARCHAR(250) NULL DEFAULT NULL,
`address2` VARCHAR(250) NULL DEFAULT NULL,
`city` VARCHAR(50) NULL DEFAULT NULL,
`province` VARCHAR(50) NULL DEFAULT NULL,
`country` VARCHAR(50) NULL DEFAULT NULL,
`zip` VARCHAR(15) NULL DEFAULT NULL,
`phone` VARCHAR(20) NULL DEFAULT NULL,
`name` VARCHAR(50) NULL DEFAULT NULL,
`province_code` VARCHAR(5) NULL DEFAULT NULL,
`country_code` VARCHAR(5) NULL DEFAULT NULL,
`country_name` VARCHAR(50) NULL DEFAULT NULL,
`longitude` VARCHAR(250) NULL DEFAULT NULL,
`latitude` VARCHAR(250) NULL DEFAULT NULL,
`default` TINYINT(1) NULL DEFAULT NULL,
`is_geo_fetched` TINYINT(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
INDEX `customer_id` (`customer_id`),
INDEX `shopify_address_id` (`shopify_address_id`),
INDEX `shopify_customer_id` (`shopify_customer_id`)
)
COLLATE='latin1_swedish_ci'
ENGINE=InnoDB;
CREATE TABLE `tbl_shop_setting` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`shop_name` VARCHAR(300) NOT NULL COLLATE 'latin1_swedish_ci',
PRIMARY KEY (`id`),
)
COLLATE='utf8mb4_general_ci'
ENGINE=InnoDB;
CREATE TABLE `aio_customer_tracking` (
`id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`shopify_customer_id` BIGINT(20) UNSIGNED NOT NULL,
`email` VARCHAR(255) NULL DEFAULT NULL,
`shop_id` BIGINT(20) UNSIGNED NOT NULL,
`domain` VARCHAR(255) NULL DEFAULT NULL,
`web_session_count` INT(11) NOT NULL,
`last_seen_date` DATETIME NULL DEFAULT NULL,
`last_contact_date` DATETIME NULL DEFAULT NULL,
`last_email_open` DATETIME NULL DEFAULT NULL,
`created_date` DATETIME NOT NULL,
`is_geo_fetched` TINYINT(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
INDEX `shopify_customer_id` (`shopify_customer_id`),
INDEX `email` (`email`),
INDEX `shopify_customer_id_shop_id` (`shopify_customer_id`, `shop_id`),
INDEX `last_seen_date` (`last_seen_date`)
)
COLLATE='latin1_swedish_ci'
ENGINE=InnoDB;
正在运行和未运行的查询案例:
1. Running: Below query fetch the records by joining all the 4 tables, It takes only 0.300 ms.
SELECT `c`.first_name,`c`.last_name,`c`.email, `t`.`last_seen_date`, `t`.`last_contact_date`, `ssh`.`shop_name`, ca.`company`, ca.`address1`, ca.`address2`, ca.`city`, ca.`province`, ca.`country`, ca.`zip`, ca.`province_code`, ca.`country_code`
FROM `tbl_customers` AS `c`
JOIN `tbl_shop_setting` AS `ssh` ON c.shop_id = ssh.id
LEFT JOIN (SELECT shopify_customer_id, last_seen_date, last_contact_date FROM aio_customer_tracking GROUP BY shopify_customer_id) as t ON t.shopify_customer_id = c.shopify_customer_id
LEFT JOIN `tbl_customers_address` as ca ON (c.shopify_customer_id = ca.shopify_customer_id AND ca.default = 1)
GROUP BY c.shopify_customer_id
LIMIT 20
2. Not running: Simply when try to get the count of these row stuk the query, I waited 10 min but still running.
SELECT
COUNT(DISTINCT c.shopify_customer_id) -- what makes #2 different
FROM `tbl_customers` AS `c`
JOIN `tbl_shop_setting` AS `ssh` ON c.shop_id = ssh.id
LEFT JOIN (SELECT shopify_customer_id, last_seen_date, last_contact_date FROM aio_customer_tracking GROUP BY shopify_customer_id) as t ON t.shopify_customer_id = c.shopify_customer_id
LEFT JOIN `tbl_customers_address` as ca ON (c.shopify_customer_id = ca.shopify_customer_id AND ca.default = 1)
GROUP BY c.shopify_customer_id
LIMIT 20
3. Not running: In the #1 query we simply put the 1 Order by clause and it get stuck, I waited 10 min but still running. I study query optimization some article and tried by indexing, Right Join etc.. but still not working.
SELECT `c`.first_name,`c`.last_name,`c`.email, `t`.`last_seen_date`, `t`.`last_contact_date`, `ssh`.`shop_name`, ca.`company`, ca.`address1`, ca.`address2`, ca.`city`, ca.`province`, ca.`country`, ca.`zip`, ca.`province_code`, ca.`country_code`
FROM `tbl_customers` AS `c`
JOIN `tbl_shop_setting` AS `ssh` ON c.shop_id = ssh.id
LEFT JOIN (SELECT shopify_customer_id, last_seen_date, last_contact_date FROM aio_customer_tracking GROUP BY shopify_customer_id) as t ON t.shopify_customer_id = c.shopify_customer_id
LEFT JOIN `tbl_customers_address` as ca ON (c.shopify_customer_id = ca.shopify_customer_id AND ca.default = 1)
GROUP BY c.shopify_customer_id
ORDER BY `t`.`last_seen_date` -- what makes #3 different
LIMIT 20
解释问题1:
解释问题2:
解释问题3:
欢迎提出优化查询、表结构的建议。
我想做的是: tbl_customers
表包含客户信息, tbl_customer_address
表包含客户的地址(一个客户可能有多个地址),以及 aio_customer_tracking
表包含客户的访问记录 last_seen_date
是访问日期。
现在,我只想获取和统计客户,以及他们的地址和访问信息。另外,我可以按这3个表中的任何列排序,在我的示例中,我是按上次看到的日期排序(默认顺序)。希望这个解释有助于理解我在做什么。
4条答案
按热度按时间z9ju0rcb1#
shopify_customer_id
是独一无二的tbl_customers
表,然后在第二个查询中,为什么在中使用distinct和group byshopify_customer_id
列?请把它扔掉。
jslywgbw2#
在查询#1(而不是其他两个)中,优化器可以使用
缩短查询时间
这是因为它可以在索引的20项之后停止。由于派生表(子查询)的原因,查询速度不是很快
t
)大约有51k行。查询#2可能很慢,因为优化器没有注意到并删除多余的
DISTINCT
. 相反,它可能认为20岁以后就停不下来了。查询#3必须完全遍历表
c
得到每一个shopify_customer_id
组。这是因为ORDER BY
防止短路到达LIMIT 20
.a中的列
GROUP BY
必须包含SELECT
除了由group by列唯一定义的任何列。既然你说过一个地址可以有多个shopify_customer_id
,然后获取ca.address1
不适合与GROUP BY shopify_customer_id
. 类似地,子查询对于last_seen_date, last_contact_date
.在
aio_customer_tracking
,这种变化(对“覆盖”索引)可能会有一些帮助:到
剖析目标
现在,我只想。。。数一数顾客
要计算客户数量,请这样做,但不要试图将其与“获取”结合起来:
现在,我只想把。。。顾客们。。。
tbl#客户-#行:2000万。
你肯定不想拍到两千万排!我不想去想该怎么做。请澄清。我不接受这么多行的分页。也许有一个
WHERE
条款??这个WHERE
子句通常是优化最重要的部分!现在,我只想把。。。客户,其中一个地址和访问信息。
假设
WHERE
过滤到“少数”客户,然后JOINing
转到另一个表以获取“any”地址和“any”访问信息,可能有问题和/或效率低下。要求“第一个”或“最后一个”而不是“任何”不会更容易,但可能更有意义。我建议你的用户界面先找几个客户,然后如果用户愿意,转到另一个包含所有地址和访问量的页面。或者访问量能达到数百次或更多?
另外,我可以按这3个表中的任何列排序,在我的示例中,我是按上次看到的日期排序(默认顺序)。
让我们集中精力优化
WHERE
,然后钉住last_seen_date
在任何索引的末尾。rkue9o1l3#
你有太多的索引,这可能是一个真正的性能杀手当涉及到插入,更新和删除,以及偶尔选择取决于优化设置。
同时,拆下
GROUP BY
声明。关于正确使用聚集索引和非聚集索引,我还有更多要说的,
GROUP BY
,ORDER BY
,WHERE
,和视图,以进行查询优化。但是,我认为如果删除一些索引,查询速度会大大加快(也可以修改查询以遵循更严格的sql标准,使其更符合逻辑,但这超出了本问题的范围。)还有一件事-你对查询结果做了什么?它是否被存储在某个地方并被访问以进行查找、用于计算、用于自动报告、通过web数据库连接显示等?这是有区别的,因为如果您只需要一个报表/备份或导出到一个平面文件,那么就有更有效的方法来获取这些数据。很多不同的选择取决于你在做什么。
cgvd09ve4#
查询2包含其他人指出的逻辑错误:
count(distinct(c.shopify_customer_id))
将返回单个值,因此您的groupby只会使查询复杂化(这可能确实会使mysql首先通过shopify\u customer\u id进行分组,然后执行count(distinct(shopify_customer_id ))
这可能是执行时间长的原因查询3的order by无法优化,因为您正在加入无法索引的子选择。它所花费的时间只是系统排序结果集所需的时间。
解决问题的方法是:
更改索引
shopify_customer_id
(shopify_customer_id
)表tbl\u customers\u address toshopify_customer_id
(shopify_customer_id
,default
)优化以下查询使用查询1的结果(result)创建一个表,但不使用
LEFT JOIN (SELECT shopify_customer_id, last_seen_date, last_contact_date FROM aio_customer_tracking GROUP BY shopify_customer_id) as t ON t.shopify_customer_id = c.shopify_customer_id
.更改结果表,为last\u seen\u date添加一列,并为last\u seen\u date和shopify\u customer\u id添加索引
为此查询的结果创建表(最后一天):
SELECT shopify_customer_id, last_seen_date, last_contact_date FROM aio_customer_tracking GROUP BY shopify_customer_id
使用表last\u date中的值更新结果表现在,您可以使用创建的索引对按最后一天排序的结果表运行查询。
整个过程所用的时间应该比执行查询2或查询3少