窗口,按Hive分区,以获得7天的平均温度

cbjzeqam  于 2021-06-25  发布在  Hive
关注(0)|答案(1)|浏览(288)

我有一个数据集,每天有多个温度读数。我期待着返回最热的7天期间的平均温度。

DROP TABLE IF EXISTS oshkosh; 
CREATE EXTERNAL TABLE IF NOT EXISTS oshkosh(year STRING, month STRING, day STRING, time STRING, temp FLOAT, dewpoint FLOAT, humidity INT, sealevel FLOAT, visibility FLOAT, winddir STRING, windspeed FLOAT, gustspeed FLOAT, precip FLOAT, events STRING, condition STRING, winddirdegrees INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION '/user/maria_dev/final/Oshkosh' tblproperties ("skip.header.line.count"="1"); 

SELECT o.theDate
,AVG(o.temp) over (order by o.theDate range INTERVAL 6 DAY preceding) AS Average
FROM
(SELECT CAST(to_date(from_unixtime(UNIX_TIMESTAMP(CONCAT(year,'-',IF(LENGTH(month)=1,CONCAT(0,month),month),'-',IF(LENGTH(day)=1,CONCAT(0,day),day)),'yyyy-MM-dd'))) as timestamp) as theDate
,temp AS temp
FROM oshkosh
WHERE temp != -9999) as o

返回错误:

Error while compiling statement: FAILED: ParseException line 2:38 cannot recognize input near 'range' 'INTERVAL' '6' in window_value_expression

我不确定是否需要时间戳作为o.thedate,因为似乎间隔6天的调用可能找不到新的一天,因为数据集的第一天有28个温度读数(第二天有44个读数,每天都是可变的)。

ej83mcc0

ej83mcc01#

尝试:

SELECT 
    o.theDate,
    AVG(o.temp) over (order by unix_timestamp(o.theDate) range between 604800 
    preceding and current row) AS Average 
FROM
(
SELECT 
    CAST(to_date(CONCAT(year,'-',IF(LENGTH(month)=1,CONCAT(0,month),month), 
    '-',IF(LENGTH(day)=1,CONCAT(0,day),day))) AS TIMESTAMP) as theDate,
    temp AS temp
FROM oshkosh
WHERE temp != -9999
) as o

相关问题