llama_index [Bug]: openai 限速错误

gcxthw6b  于 5个月前  发布在  其他
关注(0)|答案(2)|浏览(57)

Bug描述

当我们在Elasticsearch中使用dense_x时,尤其是在使用70+页时,我们会遇到以下错误:

raise self._make_status_error_from_response(err.response) from None
openai.RateLimitError: Error code: 429 - {'error': {'code': '429', 'message': 'Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2023-07-01-preview have exceeded token rate limit of your current OpenAI S0 pricing tier. Please retry after 59 seconds. Please go here:
https://aka.ms/oai/quotaincrease
if you would like to further increase the default rate limit.'}}

版本

llama-index==0.10.10

重现步骤

from llama_index.core import SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter
# from dense_base import DenseXRetrievalPack
from dense_pack2.base import DenseXRetrievalPack as dp2
from llama_index.vector_stores.elasticsearch import ElasticsearchStore
def define_rag():
    """
    This function is for defining the RAG
    """
    vector_store = ElasticsearchStore(
        index_name="dense_index_58",
        es_url="http://localhost:9200",
    )
    documents = SimpleDirectoryReader("./docs").load_data()

    llm = llm_query()
    embed_model = embeded_model()

    print("creating denseX")
    retriever_chunk = dp2(
        documents,
        proposition_llm=embed_model,
        query_llm=llm,
        text_splitter=SentenceSplitter(chunk_size=1024),
        vector_store = vector_store
        )
    query_engine_chunk = retriever_chunk.query_engine
    print("denseX creation done")

在dense_x中,我们在base.py中修改了一些代码,并将其导入为dp2,如上代码所示:

import asyncio
import json
from typing import Any, Dict, List, Optional

import yaml
from llama_index.core import Document, ServiceContext, VectorStoreIndex, StorageContext
from llama_index.core.async_utils import run_jobs
from llama_index.core.base.embeddings.base import BaseEmbedding
from llama_index.core.base.response.schema import RESPONSE_TYPE
from llama_index.core.llama_pack.base import BaseLlamaPack
from llama_index.core.node_parser.interface import TextSplitter
from llama_index.core.node_parser.text import SentenceSplitter
from llama_index.core.prompts import PromptTemplate
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.retrievers import RecursiveRetriever
from llama_index.core.schema import IndexNode, TextNode
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI

lm = llm_query()
embed_model = embeded_model()
class DenseXRetrievalPack(BaseLlamaPack):
    def __init__(
        self,
        documents: List[Document],
        proposition_llm: Optional[LLM] = None,
        query_llm: Optional[LLM] = None,
        embed_model: Optional[BaseEmbedding] = None,
        text_splitter: TextSplitter = SentenceSplitter(),
        vector_store: Optional[ElasticsearchStore] = None,
        similarity_top_k: int = 4
    ) -> None:
        """Init params."""
        self._proposition_llm = llm
        # self._proposition_llm = proposition_llm or OpenAI(
        #     model="gpt-3.5-turbo",
        #     temperature=0.1,
        #     max_tokens=750,
        # )

        embed_model = embed_model #or OpenAIEmbedding(embed_batch_size=128)

相关日志/回溯

  • 无响应*
bttbmeg0

bttbmeg01#

要解决在使用dense_x与Elasticsearch时遇到的速率限制错误,尤其是在处理80+页时,您可以让GPT索引降低速度以遵循速率限制。以下是文档中相关部分的摘录:

"content": "Hi. I'm using SimpleDirectoryReader to index a folder containing a couple dozen rather short markdown files.\n\nI got this the first time while still using my free trial tokens:\n```\nopenai.error.RateLimitError: Rate limit reached for default-global-with-image-limits in organization org-[...] on requests per min. Limit: 60 / min. Current: 80 / min. Contact support@openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method.\n```\n\nI added payment information to my account, and I'm still getting a similar error. Is there a way to make GPT Index slow down to follow rate limits?\n\nOn the topic, two other questions:\n\nIs it possible to determine how much data is broken up into each request? Or how many overall requests are sent.\n\nAnd is it possible to index once, stop, then continue indexing later?\n\nFinally, if my information source changes, do I have to re-index everything, or is there a way to apply a diff?\n\nJust some thoughts and questions before sleeping, thanks for this amazing library \ud83e\udd29",

为了减慢请求速度,您可以在代码中实现一个速率限制器。以下是在Python中如何实现此操作的示例:

import time

def rate_limited_request(request_function, max_requests_per_minute):
    interval = 60 / max_requests_per_minute
    def wrapper(*args, **kwargs):
        time.sleep(interval)
        return request_function(*args, **kwargs)
    return wrapper

# Example usage with a hypothetical request function
@rate_limited_request
def make_request():
    # Your request logic here
    pass

# Set the rate limit to 60 requests per minute
rate_limited_make_request = rate_limited_request(make_request, 60)

# Use the rate-limited function
for _ in range(80):  # Example loop to process 80+ pages
    rate_limited_make_request()

这段代码将确保您的请求不会超过API 1(https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/discover_llamaindex/document_management/discord_dumps/help_channel_dump_05_25_23.json)设置的速率限制。

  • 要继续与Dosu交谈,请提及@dosu。*
p4tfgftt

p4tfgftt2#

你可以修改llama-pack代码,具体来说

def _gen_propositions(self, nodes: List[TextNode]) -> List[TextNode]:
        """Get propositions."""
        sub_nodes = asyncio.run(
            run_jobs(
                [self._aget_proposition(node) for node in nodes],
                show_progress=True,
                workers=8,
            )
        )

        # Flatten list
        return [node for sub_node in sub_nodes for node in sub_node]

num_workers 控制着同时发出多少个调用。

相关问题