sql-只使用所有月份都存在的客户端

x6492ojm 于 2021-07-26 发布在 Java

关注(0)|答案(3)|浏览(265)

我有一个不同客户的数据集，还有他们的销售额。随着时间的推移，一些客户机会从数据中添加和删除。如何确保在查看销售计数时，我只使用数据集中一直存在的一部分客户机？如果我有一个客户没有2018-03年的记录，那么我不希望这个客户成为整个查询的一部分。如果客户机在2020-03中没有记录，那么我也不希望此客户机成为整个查询的一部分。
例如，以下查询：

select DATE_PART (y, sold_date)as year, DATE_PART (mm, sold_date) as month, count(distinct(client))
from sales_data
where sold_date > '2018-01-01' 
group by year, month 
order by year,month

产量

year    month   count
2018    1   78
2018    2   83
2018    3   80
2018    4   83
2018    5   84
2018    6   81
2018    7   83
2018    8   90
2018    9   89
2018    10  95
2018    11  94
2018    12  97
2019    1   102
2019    2   103
2019    3   102
2019    4   105
2019    5   103
2019    6   104
2019    7   104
2019    8   106
2019    9   106
2019    10  108
2019    11  109
2019    12  104
2020    1   104
2020    2   102
2020    3   103
2020    4   98
2020    5   97
2020    6   79

所以我只想使用所有月份的客户端，它们不应该超过78个，因为不能有超过最小月份（2018-1）的用户。
仅供参考，我在这里使用的是amazonredshift，但是我对rdbms无关的查询或者适用于sqlserver/oracle/mysql/postgresql的查询没有问题，我只是对如何有效解决这个问题的模式感兴趣。

sql amazon-redshift

来源：https://stackoverflow.com/questions/62242021/sql-use-only-clients-that-are-present-in-all-months

3条答案

按热度按时间

iaqfqrcu1#

如果我正确理解了您想要的内容，并且这只是一个一次性查询，那么您可以在where子句中使用相关子查询：

SELECT 
    DATE_PART(y, s.sold_date) AS year,
    DATE_PART(mm, s.sold_date) AS month,
    COUNT(DISTINCT s.client)
FROM
    sales_data AS s
WHERE
    EXISTS (
        SELECT sd.client FROM sales_data AS sd WHERE DATE_PART(y, 
        sd.sold_date) = 2018 AND DATE_PART(mm, sd.sold_date) = 1 AND 
        sd.client = s.client
    ) AND
    s.sold_date > '2018-01-01' 
GROUP BY
    year,
    month 
ORDER
    DATE_PART(y, s.sold_date),
    DATE_PART(mm, s.sold_date)

赞(0）回复(0）举报 2021-07-26

klr1opcd2#

所有月份都可以通过两步聚合完成：
按具有所有月份的客户id分组销售数据
按年度、月份加入（1）的组销售数据
这样（=12可以是一个动态表达式，具体取决于您拥有的历史数量）

with
 stable_customers as (
    select customer_id
    from sales_data
    group by 1
    having count(distinct date_trunc('month' from sold_date)=12
)
select 
 DATE_PART (y, sold_date) as year
,DATE_PART (mm, sold_date) as month,
,count(1)
from sales_date
join stable_customers
using (customer_id)
where sold_date > '2018-01-01' 
group by year, month 
order by year,month

赞(0）回复(0）举报 2021-07-26

sq1bmfud3#

使用窗口函数。很遗憾，sql server不支持 count(distinct) 作为一个窗口函数。幸运的是，使用 dense_rank() :

select year, month, count(distinct client)
from (select sd.*, year, month,
             (dense_rank() over (order by year, month) +
              dense_rank() over (order by year desc, month desc)
             ) as num_months,
             (dense_rank() over (partition by client order by year, month) +
              dense_rank() over (partition by client order by year desc, month desc)
             ) as num_months_client
      from sales_data sd cross apply
           (values (year(sold_date), month(sold_date))) v(year, month)
      where sd.sold_date > '2018-01-01' 
     ) sd
where num_months_client = num_months
group by year, month 
order by year, month;

注意：这将查看数据中的所有月份。如果所有客户都错过了2019-03年，那么这几个月根本不算。

赞(0）回复(0）举报 2021-07-26

我来回答

sql-只使用所有月份都存在的客户端

3条答案

相关问题

热门标签

最新问答