尝试在neo4j中实现jaccard相似性

tvz2xvvm  于 2022-12-29  发布在  其他
关注(0)|答案(2)|浏览(180)

我试图使用neo4j中的gds.nodesimilarity构建一个Jaccard相似性,但它在嵌套循环中给了我一个错误。这是我编写的代码,如下所示:

with m.title as watched, m1.title as recommended, collect(distinct k.keywordId) as kou, size(collect(distinct k.name)) as skou where skou>=2 return watched, recommended, kou, skou
foreach(i in kou | foreach(j in kou | (gds.similarity.jaccard(j,i)))) AS jaccardSimilarity
 order by skou DESC
liwlm1x9

liwlm1x91#

除了foreach在另一个foreach中的语法错误外,jaccard相似性参数是分类度量的列表,而不是变量kou中的数值。

for example; jaccard between [1, 2, 3, 4, 6] and [1, 3, 6] is 0.60

但是你在1和1之间,1和3之间,1和6之间,等等。
下面是一个使用Neo4j中的Movie Recommendation数据集的示例工作查询。它将根据类型计算电影"Toy Story"(我最喜欢的)与其他电影的jaccard相似度。

MATCH (a:Movie{title: 'Toy Story'}), (b:Movie) where a <> b 
WITH a, b limit 10
CALL { 
     WITH a, b  
     MATCH (a)-[:IN_GENRE]-> (g1:Genre) 
     RETURN  a.title as titleA,  b.title as titleB, collect(id(g1)) as countA, [] as countB
  UNION ALL
     WITH a, b
     MATCH (b)-[:IN_GENRE]-> (g2:Genre)  
     RETURN  a.title as titleA,  b.title as titleB, [] as countA, collect(id(g2)) as countB  
} 
WITH titleA, titleB, apoc.coll.flatten(collect(countA)) as countA, apoc.coll.flatten(collect(countB)) as countB  
RETURN titleA, titleB, gds.similarity.jaccard(countA, countB) as jaccard order by jaccard desc

结果:

╒═══════════╤═════════════════════════════╤═══════════════════╕
│"titleA"   │"titleB"                     │"jaccard"          │
╞═══════════╪═════════════════════════════╪═══════════════════╡
│"Toy Story"│"Jumanji"                    │0.6                │
├───────────┼─────────────────────────────┼───────────────────┤
│"Toy Story"│"Father of the Bride Part II"│0.2                │
├───────────┼─────────────────────────────┼───────────────────┤
│"Toy Story"│"Grumpier Old Men"           │0.16666666666666666│
├───────────┼─────────────────────────────┼───────────────────┤
│"Toy Story"│"Waiting to Exhale"          │0.14285714285714285│
├───────────┼─────────────────────────────┼───────────────────┤
│"Toy Story"│"Heat"                       │0.0                │
└───────────┴─────────────────────────────┴───────────────────┘
6qqygrtg

6qqygrtg2#

我认为你不能在foreach中使用gds.similarity函数。Foreach是用来更新该集合中的数据的。请检查下面的链接https://neo4j.com/docs/cypher-manual/current/clauses/foreach/

相关问题