我有一个表table1
,它的主键是:PRIMARY(year, month, id)
.
根据我对这个主键的理解,二叉树中的数据将按年、月、id彼此相邻地存储。
(2021 12, 1)
(2022, 12, 1)
(2022, 12, 2)
(2023, 1, 1)
CREATE TABLE `table1` (
`id` int AUTO_INCREMENT NOT NULL,
`entity_id` varchar(36) NOT NULL,
`entity_type` varchar(36) NOT NULL,
`score` decimal(4,3) NOT NULL,
`raw` json DEFAULT NULL,
`month` int NOT NULL,
`year` int NOT NULL,
`date` DATE NOT NULL,
`created_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`deleted_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`year`, `month`, `id`),
KEY (`id`),
KEY `table1_indx` (`year`, `month`,`score`,`entity_type`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
如果我有一个查询,重点是通过日期(月,年)搜索,这将是有效的,因为数据是聚集和组织在一起。
EXPLAIN
SELECT
table1.entity_id AS entity_id,
table1.entity_type,
table1.score
FROM table1
WHERE table1.month = 12
AND table1.year = 2022
AND table1.score > 0
AND table1.entity_type IN ('type1', 'type2', 'type3', 'type4');
- 如果我关于上述内容的假设是正确的,并且数据的组织也是正确的,那么如果表按年分区并按月再分区,会发生什么不同?**
CREATE TABLE `table1` (
`id` int AUTO_INCREMENT NOT NULL,
`entity_id` varchar(36) NOT NULL,
`entity_type` varchar(36) NOT NULL,
`score` decimal(4,3) NOT NULL,
`raw` json DEFAULT NULL,
`month` int NOT NULL,
`year` int NOT NULL,
`date` DATE NOT NULL,
`created_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`deleted_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`year`, `month`, `id`),
KEY (`id`),
KEY `table1_indx` (`year`, `month`,`score`,`entity_type`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
PARTITION BY RANGE (`year`)
SUBPARTITION BY HASH (`month`)
(PARTITION p2021 VALUES LESS THAN (2022)
(SUBPARTITION dec_2021 ENGINE = InnoDB,
SUBPARTITION jan_2021 ENGINE = InnoDB,
SUBPARTITION feb_2021 ENGINE = InnoDB,
SUBPARTITION mar_2021 ENGINE = InnoDB,
SUBPARTITION apr_2021 ENGINE = InnoDB,
SUBPARTITION may_2021 ENGINE = InnoDB,
SUBPARTITION jun_2021 ENGINE = InnoDB,
SUBPARTITION jul_2021 ENGINE = InnoDB,
SUBPARTITION aug_2021 ENGINE = InnoDB,
SUBPARTITION sep_2021 ENGINE = InnoDB,
SUBPARTITION oct_2021 ENGINE = InnoDB,
SUBPARTITION nov_2021 ENGINE = InnoDB),
PARTITION p2022 VALUES LESS THAN (2023)
(SUBPARTITION dec_2022 ENGINE = InnoDB,
SUBPARTITION jan_2022 ENGINE = InnoDB,
SUBPARTITION feb_2022 ENGINE = InnoDB,
SUBPARTITION mar_2022 ENGINE = InnoDB,
SUBPARTITION apr_2022 ENGINE = InnoDB,
SUBPARTITION may_2022 ENGINE = InnoDB,
SUBPARTITION jun_2022 ENGINE = InnoDB,
SUBPARTITION jul_2022 ENGINE = InnoDB,
SUBPARTITION aug_2022 ENGINE = InnoDB,
SUBPARTITION sep_2022 ENGINE = InnoDB,
SUBPARTITION oct_2022 ENGINE = InnoDB,
SUBPARTITION nov_2022 ENGINE = InnoDB),
PARTITION p2023 VALUES LESS THAN (2024)
(SUBPARTITION dec_2023 ENGINE = InnoDB,
SUBPARTITION jan_2023 ENGINE = InnoDB,
SUBPARTITION feb_2023 ENGINE = InnoDB,
SUBPARTITION mar_2023 ENGINE = InnoDB,
SUBPARTITION apr_2023 ENGINE = InnoDB,
SUBPARTITION may_2023 ENGINE = InnoDB,
SUBPARTITION jun_2023 ENGINE = InnoDB,
SUBPARTITION jul_2023 ENGINE = InnoDB,
SUBPARTITION aug_2023 ENGINE = InnoDB,
SUBPARTITION sep_2023 ENGINE = InnoDB,
SUBPARTITION oct_2023 ENGINE = InnoDB,
SUBPARTITION nov_2023 ENGINE = InnoDB),
PARTITION pmax VALUES LESS THAN MAXVALUE
(SUBPARTITION dec_max ENGINE = InnoDB,
SUBPARTITION jan_max ENGINE = InnoDB,
SUBPARTITION feb_max ENGINE = InnoDB,
SUBPARTITION mar_max ENGINE = InnoDB,
SUBPARTITION apr_max ENGINE = InnoDB,
SUBPARTITION may_max ENGINE = InnoDB,
SUBPARTITION jun_max ENGINE = InnoDB,
SUBPARTITION jul_max ENGINE = InnoDB,
SUBPARTITION aug_max ENGINE = InnoDB,
SUBPARTITION sep_max ENGINE = InnoDB,
SUBPARTITION oct_max ENGINE = InnoDB,
SUBPARTITION nov_max ENGINE = InnoDB))
最终,我试图了解primary(year,month,id)
以及对数据进行分区是否"值得"。
1条答案
按热度按时间2jcobegt1#
将分区看作是单独的表,实际上,如果使用
innodb_file_per_table
,则每个分区都存储在data目录下自己的表空间文件中。每个分区都有自己的聚集索引B树,就像任何表一样。
即使您的查询执行分区修剪,以便只扫描单个分区,但该分区可能足够大,以至于线性扫描太耗时。
分区修剪完成减少要扫描的分区的工作后,每个分区都可以使用索引。优化程序对此的选择与未分区表完全相同。使用索引有助于减少每个分区中检查的行。
因此,无论是否使用分区,使用索引来改进查询的优化无疑是一个优势。
这就提出了一个问题:如果您有优化查询的索引,那么分区是否值得。实际上,在许多情况下,只使用索引而不使用分区就可以实现充分的优化。索引有助于将搜索范围缩小到仅匹配的行,就像分区所做的那样。