mysql 请解释这个简单的SQL查询的执行过程

idfiyjo8  于 2023-08-02  发布在  Mysql
关注(0)|答案(2)|浏览(86)

用于创建此表的DDL(& D):

Create table If Not Exists Delivery (delivery_id int, customer_id int, order_date date, customer_pref_delivery_date date);
Truncate table Delivery;
insert into Delivery (delivery_id, customer_id, order_date, customer_pref_delivery_date) values ('1', '1', '2019-08-01', '2019-08-02');
insert into Delivery (delivery_id, customer_id, order_date, customer_pref_delivery_date) values ('2', '2', '2019-08-02', '2019-08-02');
insert into Delivery (delivery_id, customer_id, order_date, customer_pref_delivery_date) values ('3', '1', '2019-08-11', '2019-08-12');
insert into Delivery (delivery_id, customer_id, order_date, customer_pref_delivery_date) values ('4', '3', '2019-08-24', '2019-08-24');
insert into Delivery (delivery_id, customer_id, order_date, customer_pref_delivery_date) values ('5', '3', '2019-08-21', '2019-08-22');
insert into Delivery (delivery_id, customer_id, order_date, customer_pref_delivery_date) values ('6', '2', '2019-08-11', '2019-08-13');
insert into Delivery (delivery_id, customer_id, order_date, customer_pref_delivery_date) values ('7', '4', '2019-08-09', '2019-08-09');

字符串
x1c 0d1x的数据
我想按照customer_id的升序重新排列表中的行,对于同一customer_id的多行,按照order_date的升序重新排列。为了得到这个,我写了一个查询:

with t1 as (select *
            from delivery
            order by customer_id, order_date),
t2 as (select * from t1 group by customer_id)
select * from t2;


我得到了这张table


这里是第三排,而不是2019-08-24,我期待的是2019-08-21。请解释我在此解决方案中做错了什么?请解释为什么未按照规定的CTE发生重排?

nxagd54h

nxagd54h1#

使用合并或实体化状态优化派生表、视图引用和公用表表达式:
如果这些条件都为真,优化器将派生表或视图引用中的ORDER BY子句传播到外部查询块:

  • 外部查询不进行分组或聚合。
  • 外部查询不指定DISTINCT、HAVING或ORDER BY。
  • 外部查询将此派生表或视图引用作为FROM子句中的唯一源。

否则,优化器将忽略ORDER BY子句。
因为t1groupedt2中,the optimizer ignores the ORDER BY clause
你的查询和SickerDude43提出的查询都是不确定的。它们只会返回,因为ONLY_FULL_GROUP_BY被禁用。所有选定的列(和表达式)都应位于聚合函数中,或在功能上依赖于GROUP BY子句。
您应该阅读MySQL Handling of GROUP BY和ONLY_FULL_GROUP_BY。
以下是greatest-n-per-group的一些典型解决方案:

-- greatest (or least) per group using aggregate subquery
-- this will return multiple rows for a customer_id if there are multiple orders on min_order_date
SELECT d.*
FROM (
    SELECT customer_id, MIN(order_date) AS min_order_date
    FROM delivery
    GROUP BY customer_id
) d_min
JOIN delivery d
    ON d_min.customer_id = d.customer_id
    AND d_min.min_order_date = d.order_date;
-- greatest (or least) per group using correlated subquery
SELECT d1.*
FROM delivery d1
WHERE delivery_id = (
    SELECT delivery_id
    FROM delivery d2
    WHERE d1.customer_id = d2.customer_id
    ORDER BY customer_id, order_date, delivery_id
    LIMIT 1
);
-- greatest (or least) per group using ROW_NUMBER() window function (MySQL >= 8.0)
SELECT *
FROM (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date, delivery_id) AS rn
    FROM delivery
) d
WHERE rn = 1;

聚合和相关子查询将受益于(customer_id, order_date)上的索引。
哪一个是最快的将取决于您的数据分布。给予看。
这是一个db<>fiddle

rks48beu

rks48beu2#

你的问题太复杂了。
只需使用MIN()函数选择订单日期的最低值。

select delivery_id, customer_id, MIN(order_date), 
customer_pref_delivery_date from delivery
group by customer_id

字符串
这个应该能用
小提琴:http://sqlfiddle.com/#!9/4a8a51/1
您的解决方案不起作用的原因在注解中解释。

相关问题