Athena或Spark中的第一个值

zzwlnbp8 于 2023-02-16 发布在 Apache

关注(0)|答案(1)|浏览(146)

select id
      ,id2
      ,FIRST_VALUE(CASE WHEN app THEN date0 ELSE NULL END) IGNORE NULLS  OVER (PARTITION BY id ORDER BY date0) as date_result
from (
select 1 id, 22 as id2, false app, Date'2019-03-13' as date0
union
select 1 id, 23 as id2, true app, Date'2019-03-14' as date0
union
select 1 id, 23 as id2, true app, Date'2019-03-15' as date0
)

Above query is returning like below in Athena
| id | id2 | date_result |
| ------------ | ------------ | ------------ |
| 1 | 22 | |
| 1 | 23 | 2019-03-14 |
| 1 | 23 | 2019-03-14 |
But I was expecting like below since we do ignore nulls and partition by id for date_result
| id | id2 | date_result |
| ------------ | ------------ | ------------ |
| 1 | 22 | 2019-03-14 |
| 1 | 23 | 2019-03-14 |
| 1 | 23 | 2019-03-14 |
Could you please let me know what I am doing wrong in first_value? what is the best way to achieve this result in both Athena and spark? Thanks
I have added it in the description

apache-spark

来源：https://stackoverflow.com/questions/75404283/first-value-in-athena-or-spark

1条答案

按热度按时间

41zrol4v1#

你能告诉我在first_value中我做错了什么吗？
windows functions的默认帧是无界的前一行-当前行：
如果未指定frame_end，则使用默认值CURRENT ROW。如果未指定帧，则使用默认帧RANGE UNBOUNDED PRECEDING。
如果您想在整个分区中查找值，则需要指定帧，例如：

with dataset(id, id2, app, date0) as (
    values (1, 22, false, Date'2019-03-13'),
    (1, 23, true ,Date'2019-03-14'),
    (1, 23, true ,Date'2019-03-15')
)

select id
     , id2
     , FIRST_VALUE(if(app, date0)) IGNORE NULLS
        OVER (PARTITION BY id ORDER BY date0 RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as date_result
from dataset;

赞(0）回复(0）举报 2023-02-16

我来回答

Athena或Spark中的第一个值

1条答案

相关问题

热门标签

最新问答