sql—意外的行数不正确的连接?

jmp7cifd  于 2021-06-26  发布在  Impala
关注(0)|答案(1)|浏览(307)

我被这个难住了。我是一个初学者,也许我已经太深入的加入。作为一种方法,我为结果的不同函数分别编写了查询。我现在正在尝试合并,它正在中断。
我用impala进行查询,并在mysql中创建表,然后用hive导入hdfs。
有效的查询是:

SELECT carrier, zone, speed, min_price, price
FROM (SELECT carrier, total_wt, zone, speed, price, MIN(price) OVER(PARTITION BY speed) as min_price
      FROM shipping_prices
      WHERE total_wt IN (
    SELECT
        SUM(prod.shipping_wt)
        FROM order_details ordet
        JOIN products prod
        ON prod.prod_id = ordet.prod_id
        WHERE ordet.order_id = 5841506
        GROUP BY ordet.order_id)
    AND zone IN (
    SELECT cal_zone
      FROM (
        SELECT carrier, dest_zip, origin_zip, zone, MIN(zone) OVER(PARTITION BY carrier) as cal_zone
          FROM shipping_zones
            WHERE (origin_zip = 402 OR origin_zip = 950) AND dest_zip = '560'
          ) t
    WHERE zone=cal_zone)
      ) z
WHERE price=min_price
ORDER BY speed DESC;

这将返回:

+---------+------+-------+-----------+-------+
| carrier | zone | speed | min_price | price |
+---------+------+-------+-----------+-------+
| fedex   | 4    | slow  | 10.86     | 10.86 |
| usps    | 4    | med   | 11.15     | 11.15 |
| usps    | 4    | fast  | 40.55     | 40.55 |
+---------+------+-------+-----------+-------+

我遇到的问题似乎是在最终结果之前的步骤失败了,当它应该找到9个价格合计,每个运营商一个级别。相反,它返回27,所以它发现每个运营商每个级别有9个价格。这是没有意义的,因为每个运营商每个级别只有3个价格。以下是查询:

SELECT lev5.zone, lev5.cust_id, lev5.total_wt, lev5.zip_shrt, lev5.order_id, lev5.carrier, pric.speed, pric.price, MIN(price) OVER(PARTITION BY speed) as min_price
FROM (
      SELECT lev4.cust_id, lev4.total_wt, lev4.zip_shrt, lev4.order_id, lev4.carrier, lev4.zone
      FROM (
             SELECT lev3.cust_id, lev3.total_wt, lev3.zip_shrt, lev3.order_id, zon.carrier, zon.origin_zip, zon.dest_zip, zon.zone, MIN(zon.zone) OVER(PARTITION BY zon.carrier) as calc_zone
             FROM (
                    SELECT lev2.cust_id, SUM(shipping_wt)/2 AS total_wt, STRLEFT(lev2.zipcode, 3) AS zip_shrt, lev2.order_id
                    FROM (
                           SELECT lev1.zipcode, lev1.cust_id, lev1.order_id, ordet.prod_id
                       FROM (
                          SELECT cus.zipcode, cus.cust_id, ord.order_id
                          FROM orders ord
                          JOIN customers cus ON ord.cust_id = cus.cust_id
                          WHERE ord.order_id = 5841506 
                            ) lev1
                       JOIN order_details ordet
                       ON lev1.order_id = ordet.order_id
                       WHERE ordet.order_id = lev1.order_id
                         ) lev2
                    JOIN products prod
                    ON lev2.prod_id = prod.prod_id
                    GROUP BY lev2.cust_id, zip_shrt, lev2.order_id
                  ) lev3
             JOIN shipping_zones zon
             ON lev3.zip_shrt = zon.dest_zip
             WHERE (origin_zip = 402 OR origin_zip = 950) AND dest_zip = lev3.zip_shrt
           ) lev4
      WHERE lev4.zone = calc_zone
     ) lev5
JOIN shipping_prices pric
ON lev5.zone = pric.zone AND lev5.total_wt = pric.total_wt;

这将返回:

+------+---------+----------+----------+----------+---------+-------+-------------------+-----------+
| zone | cust_id | total_wt | zip_shrt | order_id | carrier | speed | price             | min_price |
+------+---------+----------+----------+----------+---------+-------+-------------------+-----------+
| 4    | 1050349 | 4        | 560      | 5841506  | ups     | med   | 22.24             | 11.15     |
| 4    | 1050349 | 4        | 560      | 5841506  | ups     | med   | 23.57             | 11.15     |
| 4    | 1050349 | 4        | 560      | 5841506  | fedex   | med   | 11.15             | 11.15     |
| 4    | 1050349 | 4        | 560      | 5841506  | ups     | med   | 11.15             | 11.15     |
| 4    | 1050349 | 4        | 560      | 5841506  | usps    | med   | 22.24             | 11.15     |
| 4    | 1050349 | 4        | 560      | 5841506  | fedex   | med   | 23.57             | 11.15     |
| 4    | 1050349 | 4        | 560      | 5841506  | usps    | med   | 23.57             | 11.15     |
| 4    | 1050349 | 4        | 560      | 5841506  | usps    | med   | 11.15             | 11.15     |
| 4    | 1050349 | 4        | 560      | 5841506  | fedex   | med   | 22.24             | 11.15     |
| 4    | 1050349 | 4        | 560      | 5841506  | fedex   | slow  | 11.15             | 10.86     |
| 4    | 1050349 | 4        | 560      | 5841506  | fedex   | slow  | 11.15             | 10.86     |
| 4    | 1050349 | 4        | 560      | 5841506  | fedex   | slow  | 10.86             | 10.86     |
| 4    | 1050349 | 4        | 560      | 5841506  | usps    | slow  | 11.15             | 10.86     |
| 4    | 1050349 | 4        | 560      | 5841506  | usps    | slow  | 11.15             | 10.86     |
| 4    | 1050349 | 4        | 560      | 5841506  | usps    | slow  | 10.86             | 10.86     |
| 4    | 1050349 | 4        | 560      | 5841506  | ups     | slow  | 11.15             | 10.86     |
| 4    | 1050349 | 4        | 560      | 5841506  | ups     | slow  | 11.15             | 10.86     |
| 4    | 1050349 | 4        | 560      | 5841506  | ups     | slow  | 10.86             | 10.86     |
| 4    | 1050349 | 4        | 560      | 5841506  | ups     | fast  | 69.93000000000001 | 40.55     |
| 4    | 1050349 | 4        | 560      | 5841506  | ups     | fast  | 40.55             | 40.55     |
| 4    | 1050349 | 4        | 560      | 5841506  | ups     | fast  | 65.94             | 40.55     |
| 4    | 1050349 | 4        | 560      | 5841506  | usps    | fast  | 69.93000000000001 | 40.55     |
| 4    | 1050349 | 4        | 560      | 5841506  | usps    | fast  | 40.55             | 40.55     |
| 4    | 1050349 | 4        | 560      | 5841506  | usps    | fast  | 65.94             | 40.55     |
| 4    | 1050349 | 4        | 560      | 5841506  | fedex   | fast  | 69.93000000000001 | 40.55     |
| 4    | 1050349 | 4        | 560      | 5841506  | fedex   | fast  | 40.55             | 40.55     |
| 4    | 1050349 | 4        | 560      | 5841506  | fedex   | fast  | 65.94             | 40.55     |
+------+---------+----------+----------+----------+---------+-------+-------------------+-----------+

我进入excel来确认(船运价格的源数据),并做了countifs函数,它只返回9行,因为它应该。
我非常感谢你的帮助。我真的一直在试图自己解决这个问题,但也许是因为我对连接缺乏知识,或者其他一些我没有经验去发现的问题。我真的为自己能走到这一步感到骄傲,但这个路障快把我累死了。
谢谢您

sg3maiej

sg3maiej1#

这就是我最终得到的结果,给了我预期的结果:

SELECT lev6.order_id, lev6.carrier, lev6.speed, lev6.min_price AS cheapest_price
FROM (
SELECT lev5.zone, lev5.cust_id, lev5.total_wt, lev5.zip_shrt, lev5.order_id, lev5.origin_zip, lev5.carrier, pric.speed, pric.price, MIN(pric.price) OVER(PARTITION BY speed) as min_price
FROM (
      SELECT lev4.cust_id, lev4.total_wt, lev4.zip_shrt, lev4.order_id, lev4.carrier, lev4.zone, lev4.origin_zip
      FROM (
             SELECT lev3.cust_id, lev3.total_wt, lev3.zip_shrt, lev3.order_id, zon.carrier, zon.origin_zip, zon.dest_zip, zon.zone, MIN(zon.zone) OVER(PARTITION BY zon.carrier) as calc_zone
             FROM (
                    SELECT lev2.cust_id, SUM(shipping_wt)/2 AS total_wt, STRLEFT(lev2.zipcode, 3) AS zip_shrt, lev2.order_id
                    FROM (
                           SELECT lev1.zipcode, lev1.cust_id, lev1.order_id, ordet.prod_id
                       FROM (
                          SELECT cus.zipcode, cus.cust_id, ord.order_id
                          FROM orders ord
                          JOIN customers cus ON ord.cust_id = cus.cust_id
                          WHERE ord.order_id = 6453746
                            ) lev1
                       JOIN order_details ordet
                       ON lev1.order_id = ordet.order_id
                       WHERE ordet.order_id = lev1.order_id
                         ) lev2
                    JOIN products prod
                    ON lev2.prod_id = prod.prod_id
                    GROUP BY lev2.cust_id, zip_shrt, lev2.order_id
                  ) lev3
             JOIN shipping_zones zon
             ON lev3.zip_shrt = zon.dest_zip
             WHERE (origin_zip = 402 OR origin_zip = 950) AND dest_zip = lev3.zip_shrt
           ) lev4
      WHERE lev4.zone = calc_zone
     ) lev5
JOIN shipping_prices pric
ON lev5.zone = pric.zone AND lev5.total_wt = pric.total_wt
WHERE lev5.carrier = pric.carrier
) lev6
WHERE lev6.price = lev6.min_price
ORDER BY lev6.speed DESC;

我的问题是我的级别5没有where条件。我补充说:

WHERE lev5.carrier = pric.carrier

这消除了重复和正确返回只有9个运费价格,从每个承运人(慢,中,快)3。然后我添加了lev6,只返回每种速度的最低价格。
@托尔斯滕凯特纳至于我为什么要把产品的重量之和除以2,不幸的是我没有答案。如果我不这样做,它会返回订单实际重量的两倍,不知道为什么。
这个查询不是防弹的,也没有测试过空值和所有情况(比如当两个价格在相同的速度下相等时),但它已经足够好了,我想做什么,并演示了一个概念证明。

相关问题