Apache Spark 如何在sql中动态计算当月和上月?

mwkjh3gx  于 2023-08-06  发布在  Apache
关注(0)|答案(1)|浏览(171)

如何保持本月和上个月的金额并排,使两者相减后的结果为320614.5204
下面是我的输入数据--

+----+------+----+-----+----+----+-----------+
|colA|period|colB|colC |colD|colE|amount     |
+----+------+----+-----+----+----+-----------+
|abc |202301|dt  |20110|1236|0000|200.9342749|
|abc |202301|dt  |20110|1200|0000|805086.453 |
|abc |202301|dt  |20110|1230|0000|1320.738729|
|abc |202301|dt  |20110|1057|0000|112563.2048|
|abc |202301|dt  |20110|0000|0000|481446.1162|
|abc |202301|dt  |20110|1185|0000|33143.72624|
|abc |202301|dt  |20110|1299|0000|179683.4546|
|abc |202301|dt  |20110|1208|0000|4645.649104|
|abc |202301|dt  |20110|8307|0000|12583.31342|
|abc |202301|dt  |20110|1903|0000|108461.0646|
|abc |202301|dt  |20110|1290|0000|1018387.778|
|abc |202301|dt  |20110|1711|0000|1367565.247|
|abc |202302|dt  |20110|1712|0000|23.19162437|
|abc |202302|dt  |20110|8158|0000|18382.71997|
|abc |202302|dt  |20110|1290|0000|828198.4983|
|abc |202302|dt  |20110|1903|0000|377226.3219|
|abc |202302|dt  |20110|1057|0000|51077.79188|
|abc |202302|dt  |20110|1299|0000|156910.4484|
|abc |202302|dt  |20110|0000|0000|513661.3896|
|abc |202302|dt  |20110|1185|0000|32263.95939|
|abc |202302|dt  |20110|1208|0000|733.999577 |
|abc |202302|dt  |20110|1236|0000|268.0203046|
|abc |202302|dt  |20110|8307|0000|11835.60702|
|abc |202302|dt  |20110|1711|0000|1311061.665|
|abc |202302|dt  |20110|1200|0000|1144058.608|
+----+------+----+-----+----+----+-----------+

字符串
这仅适用于当前一个月和前一个月。如何推导出相同的期间范围?话虽如此,当前月份和前一个月将以这样的方式滚动,即当前月份和当前前一个月将在定义的周期范围内循环。例如,如果我的周期范围是从202212到202306(可以是任何数量的月,在这种情况下是7个月),那么下面是它将如何实现-

1. Current and Previous month would be 202301 and 202212 respectively
2. Current and Previous month would be 202302 and 202301 respectively
3. Current and Previous month would be 202303 and 202302 respectively
4. Current and Previous month would be 202304 and 202303 respectively
5. Current and Previous month would be 202305 and 202304 respectively
6. Current and Previous month would be 202305 and 202304 respectively
7. Current and Previous month would be 202306 and 202305 respectively
...
...
...
So on and so forth for other use cases as well


我试着在202302和202301期间使用laglead函数,但它没有给我正确的结果-

Select 
    LOWER(colA) as colA, 
    "period", 
    LOWER(colB) as colB, 
    LOWER(colC) as colC, 
    LOWER(colD) as colD,
    LOWER(colE) as colE, 
    COALESCE(SUM(usdamount), 0) AS current_month, 
    LAG(COALESCE(SUM(usdamount), 0)) over(order by period ) as previous_month
from cte 
where trim(colC) = '20110' and lower(trim(colB)) = 'dt' and "period" between 202301 and  202302 
group by colA, period, colB, colC, colD, colE


本质上我的用法是SUM(CASE WHEN period = 202302 THEN amount END) - SUM(CASE WHEN period = 202301 THEN amount END)应该等同于320614.5204,并且需要如上所述动态化。此外,分组是必要的,因为我需要他们在输出以及。

z9gpfhce

z9gpfhce1#

period不需要是数据库中的字符串。使用日期数学,这很容易:

create table amounts(
  colA text not null,
  period int not null,
  colB text not null,
  colC int not null,
  colD int not null,
  colE int not null,
  amount real not null
);

insert into amounts(colA, period, colB, colC, colD, colE, amount) values
('abc', '202301', 'dt', 20110, 1236, 0000, 200.9342749),
('abc', '202301', 'dt', 20110, 1200, 0000, 805086.453 ),
('abc', '202301', 'dt', 20110, 1230, 0000, 1320.738729),
('abc', '202301', 'dt', 20110, 1057, 0000, 112563.2048),
('abc', '202301', 'dt', 20110, 0000, 0000, 481446.1162),
('abc', '202301', 'dt', 20110, 1185, 0000, 33143.72624),
('abc', '202301', 'dt', 20110, 1299, 0000, 179683.4546),
('abc', '202301', 'dt', 20110, 1208, 0000, 4645.649104),
('abc', '202301', 'dt', 20110, 8307, 0000, 12583.31342),
('abc', '202301', 'dt', 20110, 1903, 0000, 108461.0646),
('abc', '202301', 'dt', 20110, 1290, 0000, 1018387.778),
('abc', '202301', 'dt', 20110, 1711, 0000, 1367565.247),
('abc', '202302', 'dt', 20110, 1712, 0000, 23.19162437),
('abc', '202302', 'dt', 20110, 8158, 0000, 18382.71997),
('abc', '202302', 'dt', 20110, 1290, 0000, 828198.4983),
('abc', '202302', 'dt', 20110, 1903, 0000, 377226.3219),
('abc', '202302', 'dt', 20110, 1057, 0000, 51077.79188),
('abc', '202302', 'dt', 20110, 1299, 0000, 156910.4484),
('abc', '202302', 'dt', 20110, 0000, 0000, 513661.3896),
('abc', '202302', 'dt', 20110, 1185, 0000, 32263.95939),
('abc', '202302', 'dt', 20110, 1208, 0000, 733.999577 ),
('abc', '202302', 'dt', 20110, 1236, 0000, 268.0203046),
('abc', '202302', 'dt', 20110, 8307, 0000, 11835.60702),
('abc', '202302', 'dt', 20110, 1711, 0000, 1311061.665),
('abc', '202302', 'dt', 20110, 1200, 0000, 1144058.608);

with sane_dates as (
  select period,
         make_date(year => period/100, month=>period%100, day=>1) as period_start,
         sum(amount) as amount
  from amounts
  group by period
)
select amounts.*, differences.diff
from amounts
join (
  select second.period as period,
         second.amount - first.amount as diff
  from sane_dates as first
  join sane_dates as second on second.period_start = first.period_start + make_interval(months => 1)
) differences on differences.period = amounts.period;

个字符

相关问题