llama_index [Bug]: 运行 index = VectorStoreIndex.from_documents(documents) 时出现问题,

nwwlzxa7  于 5个月前  发布在  其他
关注(0)|答案(7)|浏览(53)

Bug描述

当我运行index = VectorStoreIndex.from_documents(documents)时,出现了TypeError: 'NoneType' object is not iterable错误。

版本

最新版本

重现步骤

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from dotenv import load_dotenv, find_dotenv
import os

_ = load_dotenv(find_dotenv())

documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)

当我打印文档内容时,它包含了一些内容。

相关日志/回溯信息

Traceback (most recent call last):
  File "/Users/zhouhao/Projects/AI/AI-full-stack/Lecture-Notes/07-llamaindex/run.py", line 11, in <module>
    index = VectorStoreIndex.from_documents(documents)
  File "/opt/miniconda3/envs/llamaindex/lib/python3.10/site-packages/llama_index/core/indices/base.py", line 145, in from_documents
    return cls(
  File "/opt/miniconda3/envs/llamaindex/lib/python3.10/site-packages/llama_index/core/indices/vector_store/base.py", line 75, in __init__
    super().__init__(
  File "/opt/miniconda3/envs/llamaindex/lib/python3.10/site-packages/llama_index/core/indices/base.py", line 94, in __init__
    index_struct = self.build_index_from_nodes(
  File "/opt/miniconda3/envs/llamaindex/lib/python3.10/site-packages/llama_index/core/indices/vector_store/base.py", line 308, in build_index_from_nodes
    return self._build_index_from_nodes(nodes, **insert_kwargs)
  File "/opt/miniconda3/envs/llamaindex/lib/python3.10/site-packages/llama_index/core/indices/vector_store/base.py", line 280, in _build_index_from_nodes
    self._add_nodes_to_index(
  File "/opt/miniconda3/envs/llamaindex/lib/python3.10/site-packages/llama_index/core/indices/vector_store/base.py", line 233, in _add_nodes_to_index
    nodes_batch = self._get_node_with_embedding(nodes_batch, show_progress)
  File "/opt/miniconda3/envs/llamaindex/lib/python3.10/site-packages/llama_index/core/indices/vector_store/base.py", line 141, in _get_node_with_embedding
    id_to_embed_map = embed_nodes(
  File "/opt/miniconda3/envs/llamaindex/lib/python3.10/site-packages/llama_index/core/indices/utils.py", line 138, in embed_nodes
    new_embeddings = embed_model.get_text_embedding_batch(
  File "/opt/miniconda3/envs/llamaindex/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py", line 230, in wrapper
    result = func(*args, **kwargs)
  File "/opt/miniconda3/envs/llamaindex/lib/python3.10/site-packages/llama_index/core/base/embeddings/base.py", line 332, in get_text_embedding_batch
    embeddings = self._get_text_embeddings(cur_batch)
  File "/opt/miniconda3/envs/llamaindex/lib/python3.10/site-packages/llama_index/embeddings/openai/base.py", line 429, in _get_text_embeddings
    return get_embeddings(
  File "/opt/miniconda3/envs/llamaindex/lib/python3.10/site-packages/tenacity/__init__.py", line 336, in wrapped_f
    return copy(f, *args, **kw)
  File "/opt/miniconda3/envs/llamaindex/lib/python3.10/site-packages/tenacity/__init__.py", line 475, in __call__
    do = self.iter(retry_state=retry_state)
  File "/opt/miniconda3/envs/llamaindex/lib/python3.10/site-packages/tenacity/__init__.py", line 376, in iter
    result = action(retry_state)
  File "/opt/miniconda3/envs/llamaindex/lib/python3.10/site-packages/tenacity/__init__.py", line 398, in <lambda>
    self._add_action_func(lambda rs: rs.outcome.result())
  File "/opt/miniconda3/envs/llamaindex/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/opt/miniconda3/envs/llamaindex/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/opt/miniconda3/envs/llamaindex/lib/python3.10/site-packages/tenacity/__init__.py", line 478, in __call__
    result = fn(*args, **kwargs)
  File "/opt/miniconda3/envs/llamaindex/lib/python3.10/site-packages/llama_index/embeddings/openai/base.py", line 180, in get_embeddings
    data = client.embeddings.create(input=list_of_text, model=engine, **kwargs).data
  File "/opt/miniconda3/envs/llamaindex/lib/python3.10/site-packages/openai/resources/embeddings.py", line 114, in create
    return self._post(
  File "/opt/miniconda3/envs/llamaindex/lib/python3.10/site-packages/openai/_base_client.py", line 1250, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
  File "/opt/miniconda3/envs/llamaindex/lib/python3.10/site-packages/openai/_base_client.py", line 931, in request
    return self._request(
  File "/opt/miniconda3/envs/llamaindex/lib/python3.10/site-packages/openai/_base_client.py", line 1032, in _request
    return self._process_response(
  File "/opt/miniconda3/envs/llamaindex/lib/python3.10/site-packages/openai/_base_client.py", line 1126, in _process_response
    return api_response.parse()
  File "/opt/miniconda3/envs/llamaindex/lib/python3.10/site-packages/openai/_response.py", line 313, in parse
    parsed = self._options.post_parser(parsed)
  File "/opt/miniconda3/envs/llamaindex/lib/python3.10/site-packages/openai/resources/embeddings.py", line 102, in parser
    for embedding in obj.data:
TypeError: 'NoneType' object is not iterable
sauutmhj

sauutmhj1#

嘿,@zhouhao27!我在这里帮助你解决任何bug或问题。我还可以引导你成为贡献者的过程。让我们一起解决这个问题!
要解决使用VectorStoreIndex.from_documents(documents)创建索引时的TypeError: 'NoneType' object is not iterable问题,请确保documents变量不是None,并且是可迭代的(如文档对象列表)。以下是如何正确加载文档并创建索引的示例:

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from dotenv import load_dotenv, find_dotenv

_ = load_dotenv(find_dotenv())

# Load documents
documents = SimpleDirectoryReader("./data").load_data()

# Ensure documents is not None and is a list
if documents is None:
    raise ValueError("No documents were loaded. Please check the data path and loader configuration.")
if not isinstance(documents, list):
    raise TypeError("Documents should be a list of document objects.")

# Create the index
index = VectorStoreIndex.from_documents(documents)

在将其传递给VectorStoreIndex.from_documents()1(https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/discover_llamaindex/document_management/discord_dumps/help_channel_dump_06_02_23.json)2(https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/indices/managed/vectara/base.py)3(https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/indices/base.py)之前,请确保documents已正确加载且为文档对象列表。

  • 要继续与Dosu交流,请提及@dosu。*
bkkx9g8r

bkkx9g8r2#

@zhouhao27 Can you please share your "./data" content, this will help us debug faster by getting into the root cause. Thanks!

w9apscun

w9apscun3#

嘿,@zhouhao27!我在这里帮助你解决任何bug或问题。我还可以指导你成为贡献者的过程。让我们一起解决这个问题!
要解决使用 VectorStoreIndex.from_documents(documents) 创建索引时出现的 TypeError: 'NoneType' object is not iterable 问题,请确保 documents 变量不是 None 并且是可迭代的(如文档对象列表)。以下是如何正确加载文档并创建索引的示例:

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from dotenv import load_dotenv, find_dotenv

_ = load_dotenv(find_dotenv())

# Load documents
documents = SimpleDirectoryReader("./data").load_data()

# Ensure documents is not None and is a list
if documents is None:
    raise ValueError("No documents were loaded. Please check the data path and loader configuration.")
if not isinstance(documents, list):
    raise TypeError("Documents should be a list of document objects.")

# Create the index
index = VectorStoreIndex.from_documents(documents)

在将其传递给 VectorStoreIndex.from_documents() 之前,请确保 documents 已正确加载并且是文档对象列表。

  • 要继续与 Dosu 交谈,请提及 @dosu。*

我首先检查的是 documents 。我将其打印出来,还打印出了它的类型。它有一些内容,而且是一个列表。
在我添加了你的代码之后,没有出现你添加的错误。
在我 ./data 文件夹中,有一个 .md 文件和一个 .pdf 文件。

h6my8fg2

h6my8fg24#

如果 documents 变量被确认为一个列表并且包含内容,问题可能与 VectorStoreIndex 如何处理不同的文档格式(如 .md.pdf)有关。

已知在使用 VectorStoreIndex 类处理不同类型的文档格式时存在问题。具体来说,在版本 0.6 中有一个错误,即如果使用 insert 方法将文档添加到向量中,当存储上下文持久化时,不会将文档 ID 写入 JSON 文件。然而,使用 VectorStoreIndex.from_documents 方法可以正常保存索引 1(https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/discover_llamaindex/document_management/discord_dumps/help_channel_dump_06_02_23.json)。

此外,确保 embed_nodes 函数不是因为嵌入模型无法为提供的文本生成嵌入而返回 None。这可能会发生在嵌入模型遇到文档 2(https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/indices/utils.py) 内容的问题时。

要进一步调试,您可以尝试以下步骤:

  1. 检查文档内容:确保正确读取和处理了 .md.pdf 文件的内容。

  2. 验证嵌入:验证嵌入模型是否正确为文档内容生成嵌入。

以下是一个更新后的示例,以包含额外的检查:

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from dotenv import load_dotenv, find_dotenv

_ = load_dotenv(find_dotenv())

Load documents

documents = SimpleDirectoryReader("./data").load_data()

Ensure documents is not None and is a list

if documents is None:
raise ValueError("No documents were loaded. Please check the data path and loader configuration.")
if not isinstance(documents, list):
raise TypeError("Documents should be a list of document objects.")

Print document contents for debugging

for doc in documents:
print(f"Document ID: {doc.id}, Content: {doc.get_content()}")

Create the index

index = VectorStoreIndex.from_documents(documents)


如果问题仍然存在,可以考虑通过单独加载和索引每个文档来隔离有问题的文档,以确定是否是特定的文件格式或内容导致了错误。

* 要继续与 [Dosu](https://dosu.dev) 交谈,请提及 @dosu。*
jq6vz3qz

jq6vz3qz5#

我发现我没有一个 id 用于 doc。相反,我有一个 doc_idget_content() 返回了很多文本。看起来是正确的。doc_id 是问题的原因吗?
还有一个字段 id_,它与 doc_id 相同。

rxztt3cl

rxztt3cl6#

这是发生在openai客户端内部的,我认为它与llama-index没有真正关系。你设置了API密钥吗?你更改了基本URL还是其他什么?

xqnpmsa8

xqnpmsa87#

这是发生在openai客户端内部的,我认为它与llama-index没有关系。你设置了API密钥吗?你更改了基本URL还是什么?
我不这么认为。如果是API密钥问题,我会得到不同的错误。我能够通过API调用访问openai,没有任何问题。

相关问题