llama_index [问题]:嵌入模型支持多GPU吗？

6qqygrtg 于 5个月前发布在其他

关注(0)|答案(2)|浏览(45)

问题验证

我已经在文档和discord上搜索过答案。

问题

我想使用llamaindex的语义分割器进行文档分割。是否支持在多个GPU上进行嵌入模型并行？
这是我使用一个GPU(A40)时使用的代码，以及我遇到的错误信息。

dir_path = "/data/test-splitter"
documents = SimpleDirectoryReader(dir_path).load_data()
embed_dir = '/models/embedding/bge-m3'
embed_model = HuggingFaceEmbedding(model_name=embed_dir, device="coda:0",
                                   embed_batch_size=1)
splitter = SemanticSplitterNodeParser(
    buffer_size=1, breakpoint_percentile_threshold=50, embed_model=embed_model
)
nodes = splitter.get_nodes_from_documents(documents)
for node in nodes:
    print('-' * 100)
    print(node.get_content())

torch.cuda.OutOfMemoryError: CUDA内存不足。尝试分配1.46 GiB。GPU 0的总容量为44.56 GiB,其中1.17 GiB可用。进程2617384有43.38 GiB内存在使用中。已分配的内存42.33 GiB由PyTorch分配，而PyTorch预留的760.95 MiB未分配。如果预留但未分配的内存较大，请尝试设置PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True以避免碎片化。有关内存管理的文档，请参阅此处(https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)。

llama_index

来源：https://github.com/run-llama/llama_index/issues/14561

2条答案

按热度按时间

new9mtju1#

你好，@RuiqingGuo!我在这里帮助你解决任何bug、问题或贡献。让我们一起解决这个问题！
是的，在使用LlamaIndex的语义分割器进行文档分割时，支持在多个GPU上嵌入模型并行性。提供的代码包括用于在多个GPU上分布模型的Tensor并行配置( tp_size )和流水线并行配置( pp_size )。

然而，需要注意的是，Python运行时不支持流水线并行：

这意味着虽然支持Tensor并行，但Python运行时不支持流水线并行 1(https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/llms/nvidia_tensorrt.py)。

此外， HuggingFaceEmbedding 模型没有显式处理多GPU设置。将 _device 属性设置为单个设备，并将模型移动到此单个设备：

这表明当前实现不支持在多个GPU上分布模型 2(https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/embeddings/huggingface.py)。

为了获得更好的多GPU支持，可以考虑使用其他嵌入模型，如 OptimumEmbedding 、 OpenVINOEmbedding 、 TogetherEmbedding 、 IpexLLMEmbedding 、 OctoAIEmbedding 、 VoyageEmbedding 或 GradientEmbedding3(https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/embeddings/init.py)4(https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/command_line/mappings.json)。

要继续与 @dosu 对话，请提及 @dosu。*

赞(0）回复(0）举报 5个月前

zed5wv102#

我遇到了同样的问题，一个大的PDF(260页)。当我使用SimpleDirectoryReader时，它为每一页返回文档。然后SemanticSplitterNodeParser可以嵌入文本。然而，当我使用自己的OCR并将所有内容放入一个字符串中时，SemanticSplitterNodeParser会报OOM错误。

赞(0）回复(0）举报 5个月前

我来回答

llama_index [问题]:嵌入模型支持多GPU吗？

问题验证

问题

2条答案

相关问题

热门标签

最新问答