如何根据sql中每个序列的第一个值对序列数据集进行分组?
例如,我有以下数据集
id name key metric
1 alice a 0 <- key = 'a', start of a sequence
2 alice b 1
3 alice b 1
-----------------
4 alice a 1 <- key = 'a', start of a sequence
5 alice b 0
6 alice b 0
7 alice b 0
-----------------
8 bob a 1 <- key = 'a', start of a sequence
9 bob b 1
-----------------
10 bob a 0 <- key = 'a', start of a sequence
带的行 key = 'a'
创建新组。例如,我想对所有后续行的度量求和,直到到达另一行 key = 'a'
或者别的 name
.
数据集按 id
.
最终结果应该是:
id name metric
1 alice 2
4 alice 1
8 bob 2
10 bob 0
下面是javascript中的等效操作,但我希望能够通过sql查询得到相同的结果。
data.reduce((acc, a) => {
if(a.key === 'a'){
// key = 'a' starts a new group
return [{id: a.id, name: a.name, metric: a.metric}].concat(acc)
} else {
// because the data is sorted,
// all the subsequent rows with key = 'b' belong to the latest group
const [head, ...tail] = acc
const head_updated = {...head, metric: head.metric + a.metric}
return [head_updated, ...tail]
}
}, [])
.reverse()
示例sql数据集:
with dataset as (
select
1 as id
, 'alice' as name
, 'a' as key
, 0 as metric
union select
2 as id
, 'alice' as name
, 'b' as key
, 1 as metric
union select
3 as id
, 'alice' as name
, 'b' as key
, 1 as metric
union select
4 as id
, 'alice' as name
, 'a' as key
, 1 as metric
union select
5 as id
, 'alice' as name
, 'b' as key
, 0 as metric
union select
6 as id
, 'alice' as name
, 'b' as key
, 0 as metric
union select
7 as id
, 'alice' as name
, 'b' as key
, 0 as metric
union select
8 as id
, 'bob' as name
, 'a' as key
, 1 as metric
union select
9 as id
, 'bob' as name
, 'b' as key
, 1 as metric
union select
10 as id
, 'bob' as name
, 'a' as key
, 0 as metric
)
select * from dataset
order by name, id
2条答案
按热度按时间rkkpypqq1#
你可以使用窗口功能
sum()
要创建组并进行聚合,请执行以下操作:请看演示。
结果:
inkz8wg92#
根据op在评论中写的内容,查询必须是这样的:
但是,如果您可以按名称和id添加索引,则可以真正提高查询的性能。这是因为它在聚合之前不需要排序操作。
使用一个有一百万行的表进行测试,这是explain analyze的输出,不带索引:
通过创建索引,查询计划将更改为以下内容:
索引:
查询计划:
古老的答案
基于javascript代码,您不希望按
name
,或分组依据name
在外部查询中。否则,您实际上会以一个更好的查询结束,该查询只允许您使用主索引,假设id
列已编入索引。下面是一个
dataset
有100万行: