我有下面的数据,其中包含一组ID的每月目标。目标是每个id,2020年的每个月。名为 targets
. 这个 month
列表示一年中的月份。
+-------+-------+----+--------+
| month | name | id | target |
+-------+-------+----+--------+
| 1 | Comp1 | 1 | 6000 |
+-------+-------+----+--------+
| 2 | Comp1 | 1 | 6000 |
+-------+-------+----+--------+
| 3 | Comp1 | 1 | 6000 |
+-------+-------+----+--------+
| 1 | Comp2 | 2 | 6000 |
+-------+-------+----+--------+
| 2 | Comp2 | 2 | 6000 |
+-------+-------+----+--------+
| 3 | Comp2 | 2 | 6000 |
+-------+-------+----+--------+
| 1 | Comp3 | 3 | 6000 |
+-------+-------+----+--------+
| 2 | Comp3 | 3 | 6000 |
+-------+-------+----+--------+
| 3 | Comp3 | 3 | 6000 |
+-------+-------+----+--------+
| 1 | Comp4 | 4 | 6000 |
+-------+-------+----+--------+
| 2 | Comp4 | 4 | 6000 |
+-------+-------+----+--------+
| 3 | Comp4 | 4 | 6000 |
+-------+-------+----+--------+
然后我有第二个表,它包含一组id的每日数据,并且每天更新。在我的实际数据集中,我得到了2019-01-01到今天的数据。
+------------+-------+----+--------+--------+
| yyyy_mm_dd | name | id | actual | region |
+------------+-------+----+--------+--------+
| 2019-01-01 | Comp1 | 1 | 1000 | LATAM |
+------------+-------+----+--------+--------+
| 2019-01-01 | Comp1 | 1 | 0 | EU |
+-------------------------------------------+
| 2019-01-02 | Comp1 | 1 | 2000 | EU |
+------------+-------+----+--------+--------+
| 2019-01-03 | Comp1 | 1 | 4000 | EU |
+------------+-------+----+--------+--------+
| 2019-01-01 | Comp2 | 2 | 1000 | EU |
+------------+-------+----+--------+--------+
| 2019-01-02 | Comp2 | 2 | 2000 | EU |
+------------+-------+----+--------+--------+
| 2019-01-03 | Comp2 | 2 | 3000 | EU |
+------------+-------+----+--------+--------+
| 2019-01-01 | Comp3 | 3 | 1000 | EU |
+------------+-------+----+--------+--------+
| 2019-01-02 | Comp3 | 3 | 2000 | EU |
+------------+-------+----+--------+--------+
| 2019-01-03 | Comp3 | 3 | 8000 | EU |
+------------+-------+----+--------+--------+
| 2019-01-01 | Comp4 | 4 | 1000 | EU |
+------------+-------+----+--------+--------+
| 2019-01-02 | Comp4 | 4 | 2000 | EU |
+------------+-------+----+--------+--------+
| 2019-02-03 | Comp4 | 4 | 3000 | EU |
+------------+-------+----+--------+--------+
基于以上两个表,我想创建第三个表,其中包含一些附加逻辑。最后,我想引入一个新的专栏,名为 payment
. 除非公司已通过月度目标,否则此列应始终为0。如果月度目标达成/通过,则应支付 sum actual for that month - monthly target for that month * 1%
.
以下是输出数据的外观:
+------------+-------+----+--------+--------+
| yyyy_mm_dd | name | id | actual | payout |
+------------+-------+----+--------+--------+
| 2020-01-01 | Comp1 | 1 | 1000 | 0 |
+------------+-------+----+--------+--------+
| 2020-01-02 | Comp1 | 1 | 2000 | 0 |
+------------+-------+----+--------+--------+
| 2020-01-03 | Comp1 | 1 | 4000 | 10 |
+------------+-------+----+--------+--------+
| 2020-01-01 | Comp2 | 2 | 1000 | 0 |
+------------+-------+----+--------+--------+
| 2020-01-02 | Comp2 | 2 | 2000 | 0 |
+------------+-------+----+--------+--------+
| 2020-01-03 | Comp2 | 2 | 3000 | 0 |
+------------+-------+----+--------+--------+
| 2020-01-01 | Comp3 | 3 | 1000 | 0 |
+------------+-------+----+--------+--------+
| 2020-01-02 | Comp3 | 3 | 2000 | 0 |
+------------+-------+----+--------+--------+
| 2020-01-03 | Comp3 | 3 | 8000 | 50 |
+------------+-------+----+--------+--------+
| 2020-01-01 | Comp4 | 4 | 1000 | 0 |
+------------+-------+----+--------+--------+
| 2020-01-02 | Comp4 | 4 | 2000 | 0 |
+------------+-------+----+--------+--------+
| 2020-02-03 | Comp4 | 4 | 3000 | 0 |
+------------+-------+----+--------+--------+
上述数据集中的所有名称/ID都有一个月的 target
6000英镑。所以应该只有一个 payout
当一个名称/标识在当月通过该目标时。comp1和comp3都在一月的第三天通过了月度目标,所以他们从那天起直到月底都会得到一笔付款。然后在2月份重置,因为这是一个有新目标的新月份,随着月份的进展,我们将获得新的每日数据。
我试过的:
SELECT
agg.yyyy_mm_dd,
agg.name,
agg.id,
CASE WHEN agg.actual >= targets.target THEN ((agg.actual-targets.target)/100) * 1 ELSE 0 END AS payout
FROM(
SELECT
sum(x.actual) AS actual,
x.yyyy_mm_dd,
x.name,
x.id
FROM(
SELECT
yyyy_mm_dd,
name,
id,
cast(actual as int) as actual
FROM
schema.daily_data
WHERE
yyyy_mm_dd >= '2020-01-01' AND (name = 'Comp1' OR name = 'Comp2')
) x
GROUP BY
2,3,4
) agg
INNER JOIN(
SELECT
id,
month,
target
FROM
schema.targets
) targets ON targets.id = agg.id
GROUP BY
1,2,3,4
但是,上面的每行输出多个行 name
. 这是由于daily表每天多次使用同一个公司(预期)。我以为我的小组会处理好的。另外,我不认为这是最简单的解决方案,我可能想得太多了/可以做得更有效。
4条答案
按热度按时间6tqwzwtp1#
看起来你想比较
actua
每个公司和每月target
. 您可以使用连接和窗口函数来完成此操作:q8l4jmvw2#
运行(部分)实际值和的请求很容易通过窗口函数解决。不幸的是,我不使用Hive,所以这里是我的postgres工作解决方案
从日期提取月份可能会有不同的方式,但我希望你能得到这个想法。
nle07wnf3#
另一种选择是使用窗口
SUM
函数创建一个运行总数,然后在CASE
语句来获取列值。我不是百分之百确定Hive的语法,但这是相当接近。具体来说
ROWS UNBOUNDED PRECEDING
可能还不够。你可能需要一个FOLLOWING
在那里得到正确的总数。sxissh064#
我想我现在有了一个有效的解决办法。下面给出了预期的输出。它可能会被优化一点,因为它不是最快的。