使用左连接和groupby-消除重复

xxb16uws 于 2021-07-27 发布在 Java

关注(0)|答案(2)|浏览(270)

我有两个表，我想连接在一起，并将其分组，以获得客户的成员信息。我的代码适用于连接，但当我尝试对列求和和分组时，它开始中断，我不知道为什么。

BASE TABLE : sales_detail 
+-------+-----------+-----------+-----------------------------------------+
|   order_date | transaction_id|   product_cost |  payment_type  |    country
+-------+-----------+-----------+------------------------------------------+
|   10/1     |   12345         |      20       |      mastercard |    usa
|   10/1     |   12345         |      50       |      mastercard |    usa
|   10/5     |  82456          |      50       |      mastercard |    usa
|   10/9     |  64789          |      30       |      visa       |    canada
|   10/15    |  08546          |      20       |      mastercard |    usa
|   10/15    |  08546          |      90       |      mastercard |    usa
|   10/17    |  65898          |       50       |      mastercard |   usa
+-------+-----------+-----------+-------------------------------------+

table : client_information
+-------+-----------+-----------+-------------------+
|   transaction_ID | client_Type|   membership  
+-------+-----------+-----------+----------+
|   12345        |   new         | vip         |
|   12345        |   new         | vip         |
|   82456        |   old         | normal      | 
|   08157        |   old         | vip         |
|   08546        |   old         | normal      |  
|   08546        |   old         | normal      |
|   65898        |   new         | vip         |
|   06587        |   new         | vip         |
+-------+-----------+-----------+-----------+

我希望输出像这样：

IDEAL OUTPUT
+-------+-----------+-----------+--------------------------------------------+
|   order_date | transaction_ID |   product_cost |  client_Type|   membership 
+-------+-----------+-----------+--------------------------------------------+
|   10/1     |   12345         |      70       |      new        |   vip     |
|   10/12    |   82456         |      50       |      old        |   normal  |
|   10/15    |   08546         |      110      |      old        |   normal  |
|   10/17    |   65898         |      50       |      new        |   vip  |
+-------+-----------+-----------+--------------------------------------------+

我正在尝试按事务id累加产品成本，这样我就可以按客户类型或成员身份聚合产品成本，而无需重复，因为每个事务id代表一个项目
这是我使用的代码，但它无法对我需要的项目进行分组：

select t1.order_date ,t1.transaction_ID,sum(t1.product_cost), t2.client_type, t2.membership
from sales_detail  t1
inner join client_information t2 on t1.transaction_ID=t2.transaction_ID 
where t1.payment_type='mastercard' and t1.order_date between '2020-01-02' and'2020-02-15'
and country_of_origin != 'canada'
GROUP BY t1.transaction_ID;

提前谢谢！我是一个初学者，所以仍在学习sql的来龙去脉(我正在使用Hive）

sql mysql sqlperformance

来源：https://stackoverflow.com/questions/63184368/using-using-left-joins-and-group-by-eliminate-duplicates

2条答案

按热度按时间

i86rm4rw1#

正如注解中提到的，表结构很奇怪，其中client\u information表将有多行。但是，如果是这样的话，那么您需要不同的值来避免重复。
您的mysql版本是否支持“with”子句？我不是mysql用户，但这在postgres上很管用。你可能需要稍微调整一下mysql，不确定。

with client_info as 
    (select distinct transaction_id, client_type, membership
    from client_information
    )
    select s.order_date, s.transaction_id, c.client_type, c.membership, sum(s.product_cost) as total_cost
    from sales_detail s
    join client_info c 
        on c.transaction_id = s.transaction_id
    where s.payment_type = 'mastercard'
    and s.country <> 'canada' --assuming country is always populated
    and --input your date logic
    group by s.order_date, s.transaction_id, c.client_type, c.membership

结果：

order_date;trans_id;client_type;membership;total_cost

    10/1;12345;new;vip;70

    10/15;8546;old;normal;110

    10/17;65898;new;vip;50

    10/5;82456;old;normal;50

赞(0）回复(0）举报 2021-07-27

kiz8lqtg2#

你不能准确地得到我认为你想要的，这就是原因。当您保存sales detail行并在client information表中创建看似1:1的行时，没有任何内容明确指出sales detail中的第一行与client information中的第一行匹配，second与second匹配，这允许您获得笛卡尔结果，因为唯一匹配的是id。现在，事务表中的两行显示相同的id、类型和成员身份很方便，但它是否会是具有相同id但不同类型/成员身份的单个事务？我不这么认为，但是您只在行上显示了足够的细节来覆盖查询，而不是完整的（这是可以的）。
总之，由于用户在其示例中使用“with”构造提供了“隔离”，这只不过是在主查询中直接屏蔽一个查询。您需要首先从事务中获取不同的组件，然后才能汇总数据。
也就是说，这里是不使用“with”上下文的查询。

select
      max( sd.order_date ) Order_Date,
      sd.transaction_id,
      sum( sd.product_cost ) TransactionTotalCost,
      max( ci.client_type ) Client_Type,
      max( ci.membership ) Membership
   from
      sales_detail sd
         JOIN ( select distinct
                      transaction_id,
                      client_type,
                      membership
                   from
                      client_information ) ci
            on sd.transaction_id = ci.transaction_id
   where
          sd.payment_type='mastercard' 
      and sd.order_date between '2020-01-02' and'2020-02-15'
      and sd.country != 'canada'
   group by
      sd.transaction_id

现在，上面的方法可以了，但我还有一些额外的建议。根据数据的大小，内部连接将获取所有不同的值，即针对整个数据获取不同的值，而不仅仅是讨论中的日期。为了防止这种情况发生，我将使这个内部查询成为连接到sales detail表的预查询结果，这样我只得到限定日期内的那些事务，从而得到一个较小的集合。然后连接到sales detail表。我只建议并假设sales detail表应该在事务id上有一个索引以及其他索引，以优化连接。
最后，只是为了将来的考虑，查询日期。使用“between”子句可能会导致一些问题，产生意想不到的结果。如果你的日期栏只是一个日期，没问题。但是，如果日期字段是日期/时间，那么查询'2020-01-02'和'2020-02-15'不一定会得到日期时间为'2020-02-15 14:27:35'的事务，因为下午2:27是在上午12:00:00隐含的2020-02-15之后，因此可能会忽略记录。我的偏好是做>=开始日期和<计划日期的次日。。您将在查询中看到。

select
      max( sd2.order_date ) Order_Date,
      preQuery.transaction_id,
      sum( sd2.product_cost ) TransactionTotalCost,
      max( preQuery.client_type ) Client_Type,
      max( preQuery.membership ) Membership
   from
      ( select distinct
              sd.transaction_id,
              ci.client_type,
              ci.membership
            from
               sales_detail sd
                  JOIN client_information ci
                     on sd.transaction_id = ci.transaction_id
            where
                   sd.payment_type='mastercard' 
               and sd.order_date >= '2020-01-02' 
               and sd.order_Date < '2020-02-16'
               and sd.country != 'canada'
      ) preQuery
         JOIN sales_detail sd2
            on preQuery.Transaction_ID = sd2.Transaction_ID
   group by
      preQuery.transaction_id

因此，在上面，内部预查询预先限定了关于支付类型、日期范围的所有细节，而不是加拿大。只有这样，它才会返回到实际的销售细节（通过第二个别名sd2）以获得成本和订单日期的总和。

赞(0）回复(0）举报 2021-07-27

我来回答

使用左连接和groupby-消除重复

2条答案

相关问题

热门标签

最新问答