llama_index [Bug]: Nebulagraph - upsert_triplet()函数中存在空字符串对象

vql8enpb  于 4个月前  发布在  其他
关注(0)|答案(2)|浏览(107)

Bug描述

我正在尝试从一个包含function_name、file_name和function_definition列的xlsx文件中的代码数据生成一个Nebulagraph知识图谱。但是在upsert_triplet()函数中出现了错误,提示创建的对象顶点为空字符串(请查看下面的日志)。

版本

0.10.44

重现步骤

  • 从包含function_name、file_name和function_defintion列的xlsx文件中加载代码数据。
  • 使用KnowledgeGraphIndex.from_documents()从代码数据生成知识图谱。在此步骤中出现错误。

相关日志/回溯

DEBUG:llama_index.graph_stores.nebula.nebula_graph_store:upsert_triplet()
DML query: INSERT VERTEX `entity`(name)   VALUES "Generate_telemetry_data":("Generate_telemetry_data");INSERT VERTEX `entity`(name)   VALUES "Model_name":("Model_name");INSERT EDGE `relationship`(`relationship`)   VALUES "Generate_telemetry_data"->"Model_name"@1155094724010351908:("Has parameter");
upsert_triplet()
DML query: INSERT VERTEX `entity`(name)   VALUES "Generate_telemetry_data":("Generate_telemetry_data");INSERT VERTEX `entity`(name)   VALUES "Model_name":("Model_name");INSERT EDGE `relationship`(`relationship`)   VALUES "Generate_telemetry_data"->"Model_name"@1155094724010351908:("Has parameter");
test 111...
obj:  
subj:  Generate_telemetry_data
rel:  Has default value
************
Traceback (most recent call last):
  File "/home/aion/Hung/change_impact_analysis/CIA_nebulagraph_demo.py", line 447, in <module>
    _load_doc_from_excel()
  File "/home/aion/Hung/change_impact_analysis/CIA_nebulagraph_demo.py", line 367, in _load_doc_from_excel
    genGraph_from_doc(code_method_docs, 
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aion/Hung/change_impact_analysis/CIA_nebulagraph_demo.py", line 131, in genGraph_from_doc
    kg_index = KnowledgeGraphIndex.from_documents(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aion/anaconda3/envs/cia_/lib/python3.11/site-packages/llama_index/core/indices/base.py", line 145, in from_documents
    return cls(
           ^^^^
  File "/home/aion/anaconda3/envs/cia_/lib/python3.11/site-packages/llama_index/core/indices/knowledge_graph/base.py", line 99, in __init__
    super().__init__(
  File "/home/aion/anaconda3/envs/cia_/lib/python3.11/site-packages/llama_index/core/indices/base.py", line 94, in __init__
    index_struct = self.build_index_from_nodes(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aion/anaconda3/envs/cia_/lib/python3.11/site-packages/llama_index/core/indices/base.py", line 212, in build_index_from_nodes
    return self._build_index_from_nodes(nodes)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aion/anaconda3/envs/cia_/lib/python3.11/site-packages/llama_index/core/indices/knowledge_graph/base.py", line 211, in _build_index_from_nodes
    self.upsert_triplet(triplet)
  File "/home/aion/anaconda3/envs/cia_/lib/python3.11/site-packages/llama_index/core/indices/knowledge_graph/base.py", line 259, in upsert_triplet
    self._graph_store.upsert_triplet(*triplet)
  File "/home/aion/anaconda3/envs/cia_/lib/python3.11/site-packages/llama_index/graph_stores/nebula/nebula_graph_store.py", line 499, in upsert_triplet
    obj = escape_str(obj)
          ^^^^^^^^^^^^^^^
  File "/home/aion/anaconda3/envs/cia_/lib/python3.11/site-packages/llama_index/graph_stores/nebula/nebula_graph_store.py", line 103, in escape_str
    if value[0] == " " or value[-1] == " ":
       ~~~~~^^^
IndexError: string index out of range
acruukt9

acruukt91#

嘿,@hungbui-io!我在这里帮助你解决任何错误、问题或贡献。让我们一起解决这个问题!
要解决 upsert_triplet() 函数中的错误,请确保 obj 参数不是空字符串。错误发生是因为 obj 值为空,导致在尝试访问其第一个或最后一个字符时出现 IndexError
以下是如何修改代码以在调用 upsert_triplet 之前检查空字符串的方法:

import pandas as pd
from llama_index.legacy.graph_stores.nebulagraph import NebulaGraphStore

# Load the xlsx file
df = pd.read_excel('path_to_your_file.xlsx')

# Initialize the NebulaGraphStore
nebula_store = NebulaGraphStore(
    space_name="your_space_name",
    edge_types=["relationship"],
    rel_prop_names=["defined_in"],
    tags=["entity"],
    tag_prop_names=["name"]
)

# Iterate through the rows of the dataframe and upsert triplets
for index, row in df.iterrows():
    function_name = row['function_name']
    file_name = row['file_name']
    function_definition = row['function_definition']
    
    # Ensure obj is not an empty string
    if function_name and file_name:
        nebula_store.upsert_triplet(function_name, 'defined_in', file_name)

这段代码在调用 upsert_triplet 之前检查 function_namefile_name 是否不为空。根据你的具体需求调整条件,以确保不会将空字符串传递给 upsert_triplet 方法 1(https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/indices/knowledge_graph/base.py)2(https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/graph_stores/llama-index-graph-stores-nebula/llama_index/graph_stores/nebula/nebula_graph_store.py)3(https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/graph_stores/nebulagraph.py)。

  • 要继续与 @dosu 交谈,请提及 @dosu。*
prdp8dxp

prdp8dxp2#

嗯,我想它应该过滤掉空实体/关系。

相关问题