neo4j 加载CSV内存池错误:“超出内存池限制”

3npbholx  于 2022-12-18  发布在  其他
关注(0)|答案(1)|浏览(346)

我尝试使用cypher-shell将大型CSV加载到Neo4j 5.2,并遇到了一个我以前从未见过的错误。Neo4j在Docker中运行,我使用Docker容器中的cypher-shelldocker exec ... cypher-shell

Unable to complete transaction.: The memory pool limit was exceeded. The corresponding setting can be found in the error message

Neo4j日志是空的,堆大小比总文件大小大5倍。事务甚至应该适合内存,但我使用CALL ... IN TRANSACTIONS
查询不应遇到Eager运算符:

LOAD CSV WITH HEADERS FROM 'file:///omop/CONCEPT_RELATIONSHIP_clean.csv' AS line FIELDTERMINATOR ','

CALL {
  WITH line
  MATCH (source:Concept { concept_id: line.concept_id_1 })
  MATCH (target:Concept { concept_id: line.concept_id_2 })
  CREATE (source)-[r:VOCAB_REL]->(target)
  SET r.type = line.relationship_id, r.valid_start_date = line.valid_start_date, r.valid_end_date = line.valid_end_date, r.invalid_reason = line.invalid_reason

} IN TRANSACTIONS;

你知道是什么导致了这个错误吗?

bihw5rsg

bihw5rsg1#

事务中CALL {}的默认批处理大小为1 k行。因此,对于大型事务调用来说,这可能不够。您可以将其调整为5 k到10 k行。
例如:

LOAD CSV FROM <csvfile> AS line
CALL {  ...
} IN TRANSACTIONS OF 10000 ROWS

参考:https://neo4j.com/docs/cypher-manual/current/clauses/call-subquery/#_batching
不过,我建议使用apoc迭代函数的替代解决方案。请确保您的csv文件没有重复行。谢谢。

CALL apoc.periodic.iterate(
'
LOAD CSV WITH HEADERS FROM 'file:///omop/CONCEPT_RELATIONSHIP_clean.csv' AS line RETURN line
','
  MATCH (source:Concept { concept_id: line.concept_id_1 })
  MATCH (target:Concept { concept_id: line.concept_id_2 })
  CREATE (source)-[r:VOCAB_REL]->(target)
  SET r.type = line.relationship_id, r.valid_start_date = line.valid_start_date, r.valid_end_date = line.valid_end_date, r.invalid_reason = line.invalid_reason
',
{batchSize:10000, parallel:True}) YIELD batches, total
RETURN batches, total

相关问题