聚集主索引和用于组织mysql表中数据的分区之间有什么区别

e5nszbig  于 2023-02-21  发布在  Mysql
关注(0)|答案(1)|浏览(123)

我有一个表table1,它的主键是:PRIMARY(year, month, id).
根据我对这个主键的理解,二叉树中的数据将按年、月、id彼此相邻地存储。

(2021 12, 1)
(2022, 12, 1)
(2022, 12, 2)
(2023, 1, 1)
CREATE TABLE `table1` (
  `id` int AUTO_INCREMENT NOT NULL,
  `entity_id` varchar(36) NOT NULL,
  `entity_type` varchar(36) NOT NULL,
  `score` decimal(4,3) NOT NULL,
  `raw` json DEFAULT NULL,
  `month` int NOT NULL,
  `year` int NOT NULL,
  `date` DATE NOT NULL,
  `created_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
  `updated_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `deleted_at` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`year`, `month`, `id`),
  KEY (`id`),
  KEY `table1_indx` (`year`, `month`,`score`,`entity_type`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

如果我有一个查询,重点是通过日期(月,年)搜索,这将是有效的,因为数据是聚集和组织在一起。

EXPLAIN
SELECT
  table1.entity_id AS entity_id,
  table1.entity_type,
  table1.score
FROM table1
WHERE table1.month = 12
  AND table1.year = 2022
  AND table1.score > 0
  AND table1.entity_type IN ('type1', 'type2', 'type3', 'type4');
    • 如果我关于上述内容的假设是正确的,并且数据的组织也是正确的,那么如果表按年分区并按月再分区,会发生什么不同?**
CREATE TABLE `table1` (
  `id` int AUTO_INCREMENT NOT NULL,
  `entity_id` varchar(36) NOT NULL,
  `entity_type` varchar(36) NOT NULL,
  `score` decimal(4,3) NOT NULL,
  `raw` json DEFAULT NULL,
  `month` int NOT NULL,
  `year` int NOT NULL,
  `date` DATE NOT NULL,
  `created_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
  `updated_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `deleted_at` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`year`, `month`, `id`),
  KEY (`id`),
  KEY `table1_indx` (`year`, `month`,`score`,`entity_type`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
PARTITION BY RANGE (`year`)
SUBPARTITION BY HASH (`month`)
(PARTITION p2021 VALUES LESS THAN (2022)
 (SUBPARTITION dec_2021 ENGINE = InnoDB,
  SUBPARTITION jan_2021 ENGINE = InnoDB,
  SUBPARTITION feb_2021 ENGINE = InnoDB,
  SUBPARTITION mar_2021 ENGINE = InnoDB,
  SUBPARTITION apr_2021 ENGINE = InnoDB,
  SUBPARTITION may_2021 ENGINE = InnoDB,
  SUBPARTITION jun_2021 ENGINE = InnoDB,
  SUBPARTITION jul_2021 ENGINE = InnoDB,
  SUBPARTITION aug_2021 ENGINE = InnoDB,
  SUBPARTITION sep_2021 ENGINE = InnoDB,
  SUBPARTITION oct_2021 ENGINE = InnoDB,
  SUBPARTITION nov_2021 ENGINE = InnoDB),
 PARTITION p2022 VALUES LESS THAN (2023)
 (SUBPARTITION dec_2022 ENGINE = InnoDB,
  SUBPARTITION jan_2022 ENGINE = InnoDB,
  SUBPARTITION feb_2022 ENGINE = InnoDB,
  SUBPARTITION mar_2022 ENGINE = InnoDB,
  SUBPARTITION apr_2022 ENGINE = InnoDB,
  SUBPARTITION may_2022 ENGINE = InnoDB,
  SUBPARTITION jun_2022 ENGINE = InnoDB,
  SUBPARTITION jul_2022 ENGINE = InnoDB,
  SUBPARTITION aug_2022 ENGINE = InnoDB,
  SUBPARTITION sep_2022 ENGINE = InnoDB,
  SUBPARTITION oct_2022 ENGINE = InnoDB,
  SUBPARTITION nov_2022 ENGINE = InnoDB),
 PARTITION p2023 VALUES LESS THAN (2024)
 (SUBPARTITION dec_2023 ENGINE = InnoDB,
  SUBPARTITION jan_2023 ENGINE = InnoDB,
  SUBPARTITION feb_2023 ENGINE = InnoDB,
  SUBPARTITION mar_2023 ENGINE = InnoDB,
  SUBPARTITION apr_2023 ENGINE = InnoDB,
  SUBPARTITION may_2023 ENGINE = InnoDB,
  SUBPARTITION jun_2023 ENGINE = InnoDB,
  SUBPARTITION jul_2023 ENGINE = InnoDB,
  SUBPARTITION aug_2023 ENGINE = InnoDB,
  SUBPARTITION sep_2023 ENGINE = InnoDB,
  SUBPARTITION oct_2023 ENGINE = InnoDB,
  SUBPARTITION nov_2023 ENGINE = InnoDB),
 PARTITION pmax VALUES LESS THAN MAXVALUE
 (SUBPARTITION dec_max ENGINE = InnoDB,
  SUBPARTITION jan_max ENGINE = InnoDB,
  SUBPARTITION feb_max ENGINE = InnoDB,
  SUBPARTITION mar_max ENGINE = InnoDB,
  SUBPARTITION apr_max ENGINE = InnoDB,
  SUBPARTITION may_max ENGINE = InnoDB,
  SUBPARTITION jun_max ENGINE = InnoDB,
  SUBPARTITION jul_max ENGINE = InnoDB,
  SUBPARTITION aug_max ENGINE = InnoDB,
  SUBPARTITION sep_max ENGINE = InnoDB,
  SUBPARTITION oct_max ENGINE = InnoDB,
  SUBPARTITION nov_max ENGINE = InnoDB))

最终,我试图了解primary(year,month,id)以及对数据进行分区是否"值得"。

2jcobegt

2jcobegt1#

将分区看作是单独的表,实际上,如果使用innodb_file_per_table,则每个分区都存储在data目录下自己的表空间文件中。
每个分区都有自己的聚集索引B树,就像任何表一样。
即使您的查询执行分区修剪,以便只扫描单个分区,但该分区可能足够大,以至于线性扫描太耗时。
分区修剪完成减少要扫描的分区的工作后,每个分区都可以使用索引。优化程序对此的选择与未分区表完全相同。使用索引有助于减少每个分区中检查的行。
因此,无论是否使用分区,使用索引来改进查询的优化无疑是一个优势。
这就提出了一个问题:如果您有优化查询的索引,那么分区是否值得。实际上,在许多情况下,只使用索引而不使用分区就可以实现充分的优化。索引有助于将搜索范围缩小到仅匹配的行,就像分区所做的那样。

相关问题