postgresql SQL筛选器集

z0qdvdin  于 2023-08-04  发布在  PostgreSQL
关注(0)|答案(2)|浏览(102)

我面临着一个小小的挑战。我知道这种挑战应该通过使用python来解决,但我决定用SQL来解决。我正在寻找一点帮助,因为我是一种初学者:)
我有一组4列的数据。

  • 第一列表示账户ID(应收款)
  • 第二列表示交易日期
  • 第3列表示金额
  • 第4列表示已将交易发送到应收账款的账户ID

首先我做的是我已经设置了一种子表,将分组帐户ID(应收)。原因是,我想应用一套过滤器对一组记录从特定的帐户ID应收款。我把这个查询设置成这样:

SELECT t1.account_receivable,t1.datum,t1.amount,t1.account_payable
FROM python_table as t1
WHERE t1.account_receivable IN (
    SELECT t2.account_receivable
    FROM python_table as t2
    GROUP BY 1                  )

字符串
现在我想做这个疯狂的条件来记录我只想记录,其中至少有3个连续月从同一帐户ID的第4列发送。有不能发送超过1个交易在同一个月其他我不想显示这个。天的间隔必须最大为从组的最低天开始的5天(连续月份)
这里是1个帐户ID(应收)的一组数据。我已经尽可能多地解释了我想要达到的目标...我试了几乎所有的方法,但无法解决这一个,..问我自己,在SQL中有这么多条件吗?

61441   2014-04-28  102 45437871
61441   2014-04-28  15346   45437871
61441   2014-05-16  98  306658150
**61441 2014-04-28  711 323671229
61441   2014-05-23  694 323671229
61441   2014-06-25  701 323671229
61441   2014-07-25  702 323671229
61441   2014-08-25  694 323671229
61441   2014-09-25  644 323671229**
**61441 2014-06-09  3697    342058995   this set will not match condition as interval for day
61441   2014-07-04  3692    342058995   from lowest to highest is more than 5 days
61441   2014-08-06  3665    342058995
61441   2014-09-10  3672    342058995**
61441   2014-06-10  8409    357368301
61441   2014-04-24  4136    412899724
**61441 2014-04-28  1261    440261807
61441   2014-05-23  1271    440261807
61441   2014-06-25  1267    440261807
61441   2014-07-25  1259    440261807
61441   2014-08-25  1274    440261807
61441   2014-09-25  1120    440261807**
61441   2014-06-19  141 441460477
61441   2014-08-06  314 518735975
**61441 2014-04-01  17032   547166056
61441   2014-05-02  45773   547166056
61441   2014-06-02  17821   547166056
61441   2014-07-01  17445   547166056
61441   2014-08-01  25562   547166056
61441   2014-09-02  17459   547166056**
61441   2014-09-05  157 686201636
61441   2014-09-19  126 686201636
**61441 2014-04-14  7233    762490320  This will not match condition as it has 3 transactions in
61441   2014-05-19  9703    762490320  same month
61441   2014-06-16  8875    762490320
61441   2014-07-14  8274    762490320
61441   2014-07-18  1436    762490320
61441   2014-07-28  841 762490320
61441   2014-08-15  11008   762490320
61441   2014-09-16  8334    762490320**
61441   2014-05-16  340 838201881
61441   2014-05-21  2480    838201881
61441   2014-07-14  295 838201881
61441   2014-07-14  933 838201881
61441   2014-08-25  1696    838201881
61441   2014-08-25  849 838201881
61441   2014-04-28  2011    842644517
61441   2014-09-22  8295    842644517
61441   2014-07-09  35  982718888

qco9c6ql

qco9c6ql1#

这可以通过窗口函数来实现。
以下查询的结果与您的不同之处在于,从762490320到61441的付款在2014年4月到2014年6月期间有一个三个月的区块,该区块符合其他标准,然后在7月份被三次付款中断。

with lines as (
  select *, date_trunc('month', datum)::date as year_month,
         extract('day' from datum) as day_of_month
    from python_table
), double_taps as (
  select *, 
         count(*) 
           over (partition by account_receivable, account_payable, year_month)
           as monthly_count
    from lines
), consecutive_island_breaks as (
  select *, 
         case 
           when year_month - interval '1 month' = lag(year_month) over w then 0
           else 1
         end as new_group
    from double_taps
   where monthly_count = 1
  window w as (partition by account_receivable, account_payable order by datum) 
), islands as (
  select *,
         sum(new_group) over w as island_number
    from consecutive_island_breaks
  window w as (partition by account_receivable, account_payable 
                   order by datum)
), island_sizes as (
  select *,
         count(*) over w as island_size,
         max(day_of_month) over w - min(day_of_month) over w as day_range
    from islands
  window w as (partition by account_receivable, account_payable, island_number)
)
select * 
  from island_sizes
 where island_size >= 3
   and day_range <= 5;

字符串
Working fiddle

hlswsv35

hlswsv352#

first_value(datum) OVER (PARTITION BY account_receivable,
        account_payable ORDER BY datum) AS first_datum

字符串
解决了“天数间隔必须从组的最低日起最大为5(连续月份)”的问题。

count(*) OVER (PARTITION BY  
 account_receivable,account_payable,date_trunc('month', datum)) AS cnt_txn_mon,


解决“同一个月不能发送超过1笔交易,否则我不想显示此交易”
“我只需要至少连续3个月的记录”:
表示count of(lead(date_trunc('month ',datum),1)= date_trunc('month',datum)+ interval '1 month' is true)>= 2。
演示:https://dbfiddle.uk/uwXO0tDe
查询:

WITH cte AS 
(
    SELECT
        datum,
        account_receivable,
        account_payable,
        /* for the consective month computation */
        coalesce(lead(date_trunc('month', datum)::date, 1) 
            OVER (PARTITION BY account_receivable, account_payable ORDER BY datum)
            ,date_trunc('month', datum)::date) 
        = (date_trunc('month', datum) + interval '1 month')::date AS is_next_month,
        /*for max 1 txns*/
        count(*) OVER (
            PARTITION BY account_receivable,account_payable,date_trunc('month', datum)) AS cnt_txn_mon,
        /*for maximum 5 from the lowest day of group*/
        first_value(datum) OVER (
                PARTITION BY account_receivable,account_payable ORDER BY datum) AS first_datum
    FROM  python_table
),cte1 AS 
(
    SELECT
        datum,
        account_receivable,
        account_payable,
        cnt_txn_mon,
        /*ignore date order, means in a group, there is a 3 consective month is fine */
        count(is_next_month) FILTER (WHERE is_next_month) OVER (PARTITION BY account_receivable,
            account_payable) AS cnt_mon_gaps,
        /*for maximum 5 from the lowest day of group*/
        abs(date_trunc('month', first_datum)::date - first_datum) <= 5
        OR abs((date_trunc('month', first_datum) + interval '1 month')::date - first_datum) <= 5 AS date_near_month
    FROM    cte        
    WHERE   cnt_txn_mon = 1
        
)
SELECT *
FROM      cte1
WHERE     cnt_mon_gaps >= 2 AND date_near_month
ORDER BY  account_payable,datum;

相关问题