hive 使用相邻行中的值重新计算

z9ju0rcb  于 2023-04-20  发布在  Hive
关注(0)|答案(1)|浏览(240)

我试图创建一个逻辑,使用Apache Hive或Spark SQL使用相邻行中的数据进行重新计算,但我不确定如何实现,所以我问了一个问题。

12
      11        1
   10               2
9                       3
   8                4
       7        5
            6
hh  val 
1   5
2   4
3   3
4   2
5   1
6   5
7   4
8   3
9   2
10  1
11  5
12  4

重新计算逻辑为:将两个相邻时区的值相加。通过将10点处的1、11点处的5、1点处的5和2点处的4相加到12点处的4,将12点重新计算为19。
用程序语言做起来似乎很简单,但用SQL表达起来却很困难,如果有人知道怎么做,请帮忙。
它不能表示为一个简单的聚合函数(sum)

v8wbuo2f

v8wbuo2f1#

我不确定我是否正确理解了你的要求,但是你可以考虑使用窗口函数
将两个相邻时区的值相加。

-- sample data
WITH sample_table AS (
  SELECT 1 hh, 5 val UNION ALL
  SELECT 2 hh, 4 val UNION ALL
  SELECT 3 hh, 3 val UNION ALL
  SELECT 4 hh, 2 val UNION ALL
  SELECT 5 hh, 1 val UNION ALL
  SELECT 6 hh, 5 val UNION ALL
  SELECT 7 hh, 4 val UNION ALL
  SELECT 8 hh, 3 val UNION ALL
  SELECT 9 hh, 3 val UNION ALL
  SELECT 10 hh, 1 val UNION ALL
  SELECT 11 hh, 5 val UNION ALL
  SELECT 12 hh, 4 val
)
-- query starts here
SELECT hh, val, recalculation FROM (
  SELECT *, SUM(val) OVER w AS recalculation
    FROM (
      SELECT 0 AS day, * FROM sample_table UNION ALL
      SELECT 1 AS day, * FROM sample_table UNION ALL
      SELECT 2 AS day, * FROM sample_table
    ) t
  -- below window frame covers adjacent 2 time zones (includes 5 hours).
  WINDOW w AS (ORDER BY day, hh ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING)
) t 
WHERE day = 1 ORDER BY hh;

-- query result
+----+-----+---------------+
| hh | val | recalculation |
+----+-----+---------------+
|  1 |   5 |            21 |
|  2 |   4 |            18 |
|  3 |   3 |            15 |
|  4 |   2 |            15 |
|  5 |   1 |            15 |
|  6 |   5 |            15 |
|  7 |   4 |            16 |
|  8 |   3 |            16 |
|  9 |   3 |            16 |
| 10 |   1 |            16 |
| 11 |   5 |            18 |
| 12 |   4 |            19 |
+----+-----+---------------+

相关问题