Bug描述
当我们在Elasticsearch中使用dense_x时,尤其是在使用70+页时,我们会遇到以下错误:
raise self._make_status_error_from_response(err.response) from None
openai.RateLimitError: Error code: 429 - {'error': {'code': '429', 'message': 'Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2023-07-01-preview have exceeded token rate limit of your current OpenAI S0 pricing tier. Please retry after 59 seconds. Please go here:
https://aka.ms/oai/quotaincrease
if you would like to further increase the default rate limit.'}}
版本
llama-index==0.10.10
重现步骤
from llama_index.core import SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter
# from dense_base import DenseXRetrievalPack
from dense_pack2.base import DenseXRetrievalPack as dp2
from llama_index.vector_stores.elasticsearch import ElasticsearchStore
def define_rag():
"""
This function is for defining the RAG
"""
vector_store = ElasticsearchStore(
index_name="dense_index_58",
es_url="http://localhost:9200",
)
documents = SimpleDirectoryReader("./docs").load_data()
llm = llm_query()
embed_model = embeded_model()
print("creating denseX")
retriever_chunk = dp2(
documents,
proposition_llm=embed_model,
query_llm=llm,
text_splitter=SentenceSplitter(chunk_size=1024),
vector_store = vector_store
)
query_engine_chunk = retriever_chunk.query_engine
print("denseX creation done")
在dense_x中,我们在base.py中修改了一些代码,并将其导入为dp2,如上代码所示:
import asyncio
import json
from typing import Any, Dict, List, Optional
import yaml
from llama_index.core import Document, ServiceContext, VectorStoreIndex, StorageContext
from llama_index.core.async_utils import run_jobs
from llama_index.core.base.embeddings.base import BaseEmbedding
from llama_index.core.base.response.schema import RESPONSE_TYPE
from llama_index.core.llama_pack.base import BaseLlamaPack
from llama_index.core.node_parser.interface import TextSplitter
from llama_index.core.node_parser.text import SentenceSplitter
from llama_index.core.prompts import PromptTemplate
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.retrievers import RecursiveRetriever
from llama_index.core.schema import IndexNode, TextNode
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
lm = llm_query()
embed_model = embeded_model()
class DenseXRetrievalPack(BaseLlamaPack):
def __init__(
self,
documents: List[Document],
proposition_llm: Optional[LLM] = None,
query_llm: Optional[LLM] = None,
embed_model: Optional[BaseEmbedding] = None,
text_splitter: TextSplitter = SentenceSplitter(),
vector_store: Optional[ElasticsearchStore] = None,
similarity_top_k: int = 4
) -> None:
"""Init params."""
self._proposition_llm = llm
# self._proposition_llm = proposition_llm or OpenAI(
# model="gpt-3.5-turbo",
# temperature=0.1,
# max_tokens=750,
# )
embed_model = embed_model #or OpenAIEmbedding(embed_batch_size=128)
相关日志/回溯
- 无响应*
2条答案
按热度按时间bttbmeg01#
要解决在使用
dense_x
与Elasticsearch时遇到的速率限制错误,尤其是在处理80+页时,您可以让GPT索引降低速度以遵循速率限制。以下是文档中相关部分的摘录:为了减慢请求速度,您可以在代码中实现一个速率限制器。以下是在Python中如何实现此操作的示例:
这段代码将确保您的请求不会超过API 1(https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/discover_llamaindex/document_management/discord_dumps/help_channel_dump_05_25_23.json)设置的速率限制。
p4tfgftt2#
你可以修改llama-pack代码,具体来说
num_workers
控制着同时发出多少个调用。