ChatGPT-3 Langchain -多输入序列链

xxe27gdn  于 2023-08-07  发布在  其他
关注(0)|答案(2)|浏览(230)

我正在使用langchain,所以我的问题可能不相关,但我很难在文档中找到一个例子。
实际上,据我所知,SequentialChain是用来接收第一个链的一个或多个输入,然后将n-1链的输出馈送到n链中。
假设我正在使用3个链,第一个将csv文件的片段作为输入,并对csv来自何处进行一些描述,下一个将我们的csv文件的片段作为输入,并将第一个链的输出作为输出,以生成Python脚本。
下面是工作“非顺序”版本:

DATA_REVIEW = """ You are a datascientist specialized in business analysis. You are able to retrieve the most relevant metrics in every json file. You are able to give complete and detailed review of how thoses metrics can be used for making profit. A snippet of the full Json is given as context. Your role is to write down all type of metrics that can be retrieved from the full json. Don't do the calculation, the metrics list will be send to a python developer. You also should include metrics that can be used for comparison.

after the metrics list, write the columns name list. 

context:
{data}

Metrics that can be retrieved from the full json:
"""
PYTHON_SCRIPT = """You are a datascientist specialized in business analysis. You are able to write powerfull and efficient python code to retrieve metrics from a dataset. Your role is to write a python script for all type of metrics described above based on the structure of the dataset. Your python script should print all metrics calculated and 
each products followed by their whole metrics. You should always use pandas library. After you printed out all the metrics, store them as in the example below:
metrics_result = f'Total number of products: (total_products)'
metrics_result += f'Average price of products: (avg_price)'
for index, row in df.iterrows():
    metrics_result += f'Product ID: (row["product_id"])'
    metrics_result += f'Product Name: (row["product_name"])'

Make sure to replace unwanted character for each column and to convert value to the desired type before going into calculation. Also pay attention to the columns exact name. Data are represented as a json below but the file they came from is an xlsx. Your code should always start with :


structure:
{data}

Metrics to retrieve:
{output}

python script:

"""
prompt_template = PromptTemplate(
            input_variables=['data'],
            template=DATA_REVIEW
            )
        openai = OpenAI(model_name="text-davinci-003",openai_api_key='KEY', temperature=0, max_tokens=3000)
        output = openai(prompt_template.format(data=data))
        python_script_template = PromptTemplate(
            input_variables=['data','output'],
            template=PYTHON_SCRIPT
            )
        openai = OpenAI(model_name="text-davinci-003",openai_api_key='KEY', temperature=0, max_tokens=3000)
        script = openai(python_script_template.format(
                output = output,
                data = data
                ))

#Actual sequential chain script 'not working' 

llm = OpenAI(temperature=0.0)

prompt = PromptTemplate(
    input_variables=["data_snippet"],
    template="""You are a datascientist specialized in business analysis. You are able to retrieve the most relevant metrics in every json file. You are able to give complete and detailed review of how thoses metrics can be used for making profit. Your next project is for a Beauty e-shop business. a snippet of the full Json is given as context. Your role is to write down all type of metrics that can be retrieved from the full json. You also should include metrics that can be used for comparison.
    context:
        {data_snippet}
    
    metrics that can be retrieved from the complete file:
"""
)

chain = LLMChain(llm=llm, prompt=prompt, output_key='metrics')

data_snippet = read_csv_data(csv_file_path)

data_snippet_str = str(data_snippet)
metrics = chain.run(data_snippet_str)
second_prompt = PromptTemplate(
    input_variables=["data_snippet", "metrics"],
    template=
"""You are a datascientist specialized in business analysis. You are able to write powerfull and efficient python code to retrieve metrics from a dataset. Your role is to write a python script for all type of metrics described above based on the structure of the dataset. Your python script should print all metrics calculated and 
    each products followed by their whole metrics. You should always use pandas library. After you printed out all the metrics, store them as in the example below:
        metrics_result = f'Total number of products: (total_products)'
        metrics_result += f'Average price of products: (avg_price)'
        for index, row in df.iterrows():
            metrics_result += f'Product ID: (row["product_id"])'
            metrics_result += f'Product Name: (row["product_name"])'

    Make sure to replace unwanted character for each column and to convert value to the desired type before going into calculation. Also pay attention to the columns exact name. Data are represented as a json below but the file they came from is an xlsx. Your code should always start with :
        import pandas as pd
        data = CSV_FILE
        df = pd.read_csv(data)

    structure:
        {data_snippet}

    Metrics to retrieve:
        {metrics}

    python script:
"""
)

chain_two = LLMChain(llm=llm, prompt=second_prompt, output_key='script')

from langchain.chains import SimpleSequentialChain

overall_chain = SimpleSequentialChain(chains=[chain, chain_two], input_variables=['data_snippet_str'], output_variables=["metrics","script"], verbose=True)

python_script = overall_chain.run([data_snippet_str, chain_two])

字符串

pn9klfpd

pn9klfpd1#

我已经花了四个小时来解决这个问题。让我质疑LangChain增加的复杂性,做一些应该如此简单的事情-在链之间传递多个值。

0wi1tuuw

0wi1tuuw2#

这就是你得到的错误:


的数据
前两个验证错误是SimpleSequentialChain没有output_variablesinput_variables命名参数
第三个验证错误是,提示模板应该只有一个输入。
second_prompt有两个输入变量

second_prompt = PromptTemplate(
    input_variables=["data_snippet", "metrics"],

字符串
如果从input_variables和模板本身中删除metrics

Metrics to retrieve:
        # remove this from template
        {metrics}


会成功的。工作证明


相关问题