clickhouse—一种高效的方法,可以在时间窗口外立即获取行

wwtsj6pe  于 2021-07-15  发布在  ClickHouse
关注(0)|答案(2)|浏览(516)

我们将时间序列数据存储在clickhouse表中,类似于:

timestamp             value
2020-03-05 11:03:00    2
2020-03-05 11:12:00    3
2020-03-05 11:13:00    4
2020-03-05 11:27:00    5
2020-03-05 11:31:00    6
2020-03-05 11:39:00    7

当可视化这些数据时,我们需要一个时间范围,比如 2020-03-05 11:15:00 - 2020-03-05 11:30:00 .
在这个范围内选择数据是很容易的,但是对于可视化来说更有用的是同时获得两边的点,即:

2020-03-05 11:12:00    3
2020-03-05 11:13:00    4
2020-03-05 11:27:00    5
2020-03-05 11:31:00    6

在clickhouse有没有一个有效的方法来实现这一点?目前,我正在(可能)进行3个独立的查询:
选择范围内的数据:

select * from data where timestamp >= "from" and timestamp <= "to" order by timestamp

如果第一个点的时间戳!“从“时间戳:

select * from data where timestamp < "from" order by timestamp desc limit 1

如果最后一点的时间戳!“至“时间戳:

select * from data where timestamp > "to" order by timestamp limit 1

如果能在一个查询中得到这个,那就太好了。

mum43rcc

mum43rcc1#

看起来只需将所有三个查询合并为一个查询,并稍微更改比较运算符:

SELECT * 
FROM (
  SELECT *
  FROM
  (
      /* test data */
      SELECT data.1 AS timestamp, data.2 AS value
      FROM (SELECT arrayJoin([(toDateTime('2020-03-05 11:03:00'), 2), (toDateTime('2020-03-05 11:12:00'), 3), (toDateTime('2020-03-05 11:13:00'), 4), (toDateTime('2020-03-05 11:27:00'), 5), (toDateTime('2020-03-05 11:31:00'), 6), (toDateTime('2020-03-05 11:39:00'), 7)]) AS data)
  )
  WHERE timestamp > '2020-03-05 11:15:00' AND timestamp < '2020-03-05 11:30:00'
  UNION ALL
  SELECT DISTINCT *
  FROM (
    SELECT *
    FROM
    (
        /* test data */
        SELECT data.1 AS timestamp, data.2 AS value
        FROM (SELECT arrayJoin([(toDateTime('2020-03-05 11:03:00'), 2), (toDateTime('2020-03-05 11:12:00'), 3), (toDateTime('2020-03-05 11:13:00'), 4), (toDateTime('2020-03-05 11:27:00'), 5), (toDateTime('2020-03-05 11:31:00'), 6), (toDateTime('2020-03-05 11:39:00'), 7)]) AS data)
    )
    WHERE timestamp <= '2020-03-05 11:15:00'
    ORDER BY timestamp DESC
    LIMIT 1
    UNION ALL
    SELECT *
    FROM
    (
        /* test data */
        SELECT data.1 AS timestamp, data.2 AS value
        FROM (SELECT arrayJoin([(toDateTime('2020-03-05 11:03:00'), 2), (toDateTime('2020-03-05 11:12:00'), 3), (toDateTime('2020-03-05 11:13:00'), 4), (toDateTime('2020-03-05 11:27:00'), 5), (toDateTime('2020-03-05 11:31:00'), 6), (toDateTime('2020-03-05 11:39:00'), 7)]) AS data)
    )
    WHERE timestamp >= '2020-03-05 11:30:00'
    ORDER BY timestamp ASC
    LIMIT 1))
ORDER BY timestamp;
/* result
┌───────────timestamp─┬─value─┐
│ 2020-03-05 11:13:00 │     4 │
│ 2020-03-05 11:27:00 │     5 │
│ 2020-03-05 11:31:00 │     6 │
└─────────────────────┴───────┘

* /
..
WHERE timestamp > '2020-03-05 11:13:00' AND timestamp < '2020-03-05 11:30:00'
..
/* result
┌───────────timestamp─┬─value─┐
│ 2020-03-05 11:13:00 │     4 │
│ 2020-03-05 11:27:00 │     5 │
│ 2020-03-05 11:31:00 │     6 │
└─────────────────────┴───────┘

* /
..
WHERE timestamp > '2020-03-05 11:15:00' AND timestamp < '2020-03-05 11:31:00'
..
/* result
┌───────────timestamp─┬─value─┐
│ 2020-03-05 11:13:00 │     4 │
│ 2020-03-05 11:27:00 │     5 │
│ 2020-03-05 11:31:00 │     6 │
└─────────────────────┴───────┘

* /
..
WHERE timestamp > '2020-03-05 11:27:00' AND timestamp < '2020-03-05 11:31:00'
..
/* result
┌───────────timestamp─┬─value─┐
│ 2020-03-05 11:27:00 │     5 │
│ 2020-03-05 11:31:00 │     6 │
└─────────────────────┴───────┘

* /
..
WHERE timestamp > '2020-03-05 11:28:00' AND timestamp < '2020-03-05 11:28:00'
..
/* result
┌───────────timestamp─┬─value─┐
│ 2020-03-05 11:27:00 │     5 │
│ 2020-03-05 11:31:00 │     6 │
└─────────────────────┴───────┘

* /
..
WHERE timestamp > '2020-03-05 11:31:00' AND timestamp < '2020-03-05 11:31:00'
..
/*
result
┌───────────timestamp─┬─value─┐
│ 2020-03-05 11:31:00 │     6 │
└─────────────────────┴───────┘

* /
tvokkenx

tvokkenx2#

我会通过一些函数将范围扩展到startoftenminutes('2020-03-05 11:15:00)到startoftenminutes('2020-03-05 11:30:00')+600或-600+600
并在客户端过滤多余的行。因为3个查询比1个慢。

相关问题