scala 如何计算spark dataframe中连续行之间的差异[关闭]

tf7tbtn2  于 2023-06-29  发布在  Scala
关注(0)|答案(1)|浏览(169)

**关闭。**此题需要debugging details。目前不接受答复。

编辑问题以包括desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem。这将帮助其他人回答这个问题。
5天前关闭。
Improve this question

  • 我有一个结构如下的DataFrame,每当ch_status为零时,我需要计算TEMPERATURE列中相应行之间的差异,并需要在新列中显示为每个ID*的diff

预期结果将是:

”你能帮助我如何做到这一点吗?*

nqwrtyyt

nqwrtyyt1#

您可以使用间隙和孤岛方法并找到连续的运行。这种替代方法很简单:

with data as (
    select *,
        lag(ch_status) over (partition by id order by timestamp) as last_status,
        lag(temperature) over (partition by id order by timestamp) as last_temperature
    from T
)
select id, sum(temperature - last_temperature) as diff
from data
where ch_status = 0 and last_status = 0
group by id;

无论序列是否单调(总是递增),这都将起作用。

相关问题