neo4j抱怨隐式分组表达式

wwodge7n  于 2023-02-19  发布在  其他
关注(0)|答案(3)|浏览(252)

我从neo4j密码聚合类https://www.youtube.com/watch?v=wfMTg0ujVjk复制了一个示例类查询

Match (a:Actor)
Where a.born is not null
And a.name starts with 'Tom'
with count(a) as NumActors, collect(duration.between(date(a.born), date())) as Ages
Unwind Ages AS x
Return sum(x), sum(x)/NumActors

但是,在类提供的neo4j Web控制台中,我得到了这个错误

Aggregation column contains implicit grouping expressions. For example, in 'RETURN n.a, n.a + n.b + count(*)' the aggregation expression 'n.a + n.b + count(*)' includes the implicit grouping key 'n.b'. It may be possible to rewrite the query by extracting these grouping/aggregation expressions into a preceding WITH clause. Illegal expression(s): NumActors (line 6, column 8 (offset: 183))
"Return sum(x), sum(x)/NumActors"
        ^

显然,此查询有效

Match (a:Actor)
Where a.born is not null
And a.name starts with 'Tom'
with count(a) as NumActors, collect(duration.between(date(a.born), date())) as Ages
Unwind Ages AS x
Return sum(x)

因此,当我们使用NumActors作为隐式聚合键sum(x)时,它会抱怨。我如何通过更新查询(计算sum和average)来实现相同的目标?我的初始查询语法看起来非常好...

btxsgosb

btxsgosb1#

为此,请使用函数AVG():

Match (a:Actor)
Where a.born is not null
And a.name starts with 'Tom'
with  collect(duration.between(date(a.born), date())) as Ages
Unwind Ages AS x
Return sum(x), avg(x)

样品结果:

╒══════════╤═════════╕
│"sum(x)"  │"avg(x)" │
╞══════════╪═════════╡
│P276Y5M14D│P69Y1M11D│
└──────────┴─────────┘

参考:https://neo4j.com/docs/cypher-manual/current/functions/aggregating/#functions-avg-duration
为了解释您的错误,neo4j允许聚合表达式,如果它符合此准则中所述的一些要求:https://neo4j.com/docs/cypher-manual/current/functions/aggregating/#grouping-keys
如果你坚持使用你的原始查询,这是neo4j错误信息中所述的修复方法。你应该先使用WITH来获取sum(),然后用它来计算平均值。

Match (a:Actor)
Where a.born is not null
And a.name starts with 'Tom'
with count(a) as NumActors, collect(duration.between(date(a.born), date())) as Ages
UNWIND Ages as x
WITH sum(x) as total,  NumActors
RETURN total, total/NumActors as avg
62o28rlo

62o28rlo2#

这是一个有点棘手的问题。所以我会尽可能地简化它。让我们试着理解聚合是如何工作的,对于这个查询:

Match (a:Actor)
Where a.born is not null
And a.name starts with 'Tom'
with count(a) as NumActors, collect(duration.between(date(a.born), date())) as Ages
Unwind Ages AS x
Return sum(x)

我们首先匹配名字以Tom开头并有出生日期的演员,假设我们有10个演员,在MATCHWHERE子句执行之后。

with count(a) as NumActors, collect(duration.between(date(a.born), date())) as Ages

这将执行两个聚合操作,countcollect,但请注意,我们没有专门指定任何分组键。因此,此查询的输出将是一个包含两列NumActorsAges的单行。现在,我们展开age数组,因此现在我们将有10行,最后,我们返回总和。因为没有明确的分组键,所以计算所有数据行的和。2因此它可以工作。
让我们考虑一下您的查询:

Match (a:Actor)
Where a.born is not null
And a.name starts with 'Tom'
with count(a) as NumActors, collect(duration.between(date(a.born), date())) as Ages
Unwind Ages AS x
Return sum(x), sum(x)/NumActors

UNWIND stage之前的所有内容都与上面相同。现在,请注意return语句:

Return sum(x), sum(x)/NumActors

这里也没有指定显式分组键。因此,sum(x)将包含所有行,但术语sum(x)/NumActors是错误的,因为Neo4j无法确定要从哪一行中选取NumActors的值,因为它没有被指定为分组键。这就是为什么会出现错误。消息如下:

Aggregation column contains implicit grouping expressions. For example, in 'RETURN n.a, n.a + n.b + count(*)' the aggregation expression 'n.a + n.b + count(*)' includes the implicit grouping key 'n.b'. It may be possible to rewrite the query by extracting these grouping/aggregation expressions into a preceding WITH clause. Illegal expression(s): NumActors (line 6, column 8 (offset: 183))
"Return sum(x), sum(x)/NumActors"

很明显,键n.b正在分组,它不能用在聚合表达式中。类似地,这里NumActors正在分组,我们不能用在聚合表达式中。试试这个:

Match (a:Actor)
Where a.born is not null
And a.name starts with 'Tom'
with count(a) as NumActors, collect(duration.between(date(a.born), date())) as Ages
Unwind Ages AS x
Return sum(x), sum(x)/COLLECT(DISTINCT NumActors)[0]
pgky5nke

pgky5nke3#

该错误是由于您在RETURN子句中使用了NumActors,但没有使其成为显式分组键而导致的。

选项1

指定NumActors作为显式分组键:

...
RETURN sum(x), sum(x)/NumActors, NumActors

选项2

不要在RETURN子句中进行聚合:

MATCH (a:Actor)
WHERE a.born IS NOT NULL AND a.name STARTS WITH 'Tom'
WITH COUNT(a) AS numActors, SUM(duration.between(DATE(a.born), DATE())) as sumAges
RETURN sumAges, sumAges/numActors

注意:上面的代码还通过在第一个(现在也是唯一的)WITH子句中执行SUM来简化(并加快)查询,您不需要浪费的COLLECT-〉UNWIND-〉SUM序列。

选项3

如果不需要返回所有年龄的总和,直接计算平均年龄即可:

MATCH (a:Actor)
WHERE a.born IS NOT NULL AND a.name STARTS WITH 'Tom'
RETURN AVG(duration.between(DATE(a.born), DATE())) as avgAge

附录

还有一个性能改进可以用于所有的解决方案,它还可以消除一个可能的错误。
我们应该计算一次DATE()的值,将其放入一个变量中,然后使用该变量计算所有的年龄,这样不仅可以避免重复调用DATE(),而且可以保证所有的年龄都使用同一个当前日期计算。
如果没有这个调整,那么我们就有可能在计算过程中使用多个当前日期,这将给我们带来不一致的结果。
例如,选项3可以变成这样:

WITH DATE() AS now
MATCH (a:Actor)
WHERE a.born IS NOT NULL AND a.name STARTS WITH 'Tom'
RETURN AVG(duration.between(DATE(a.born), now)) as avgAge

相关问题