批量推理作业输入的JSON格式错误- Amazon Personalize

xyhw6mcr 于 2023-05-30 发布在其他

关注(0)|答案(1)|浏览(461)

我已经在Amazon Personalize中使用“similar-items”配方创建了一个解决方案版本，并尝试使用批处理推理作业对其进行测试。我遵循AWS文档，其中规定输入应该是一个itemId列表，最多500个项目，每个itemId用一个新行分隔：

{"itemId": "105"}
{"itemId": "106"}
{"itemId": "441"}
...

因此，我编写了以下代码将item_ids列转换为所描述的JSON格式：

# convert item_id column to required JSON format with new lines entered between items
    items_json = items_df['ITEM_ID'][1:200].to_json(orient='columns').replace(',','}\n{')

    # write output to json file
    with open('items_json.json', 'w') as f:
        json.dump(items_json, f)

    # write file to S3
    from io import StringIO  
    import s3fs

    #Connect to S3 default profile
    s3 = boto3.client('s3')

    s3.put_object(
         Body=json.dumps(items_json),
         Bucket='bucket',
         Key='personalize/batch-recommendations-input/items_json.json'
    )

然后，当我以它作为输入运行批处理推理作业时，它给出以下错误：“用户错误：输入JSON格式不正确。”
我的示例JSON输入如下所示：

"{"itemId":"12637"} {"itemId":"12931"} {"itemId":"13005"}"

在将其复制到S3之后，如下所示（添加反斜杠）-不知道这是否重要：

"{\"itemId\":\"12637\"}\n{\"itemId\":\"12931\"}\n{\"itemId\":\"13005\"}"

对我来说，我的格式看起来与他们要求的非常相似，有什么线索可能导致错误吗？

JSON

来源：https://stackoverflow.com/questions/76356828/json-malformed-error-for-batch-inference-job-input-amazon-personalize

1条答案

按热度按时间

mbyulnm01#

您只需要对to_json的使用做一些小的更改。具体来说，orient应该是records，lines应该是True。
完整示例：

import pandas as pd
import boto3

items_df = pd.read_csv("...")

# Make sure item ID column name is "itemId"
item_ids_df = items_df.rename(columns={"ITEM_ID": "itemId"})[["itemId"]]

# Write df to file in JSON lines format
item_ids_df.to_json("job_input.json", orient="records", lines=True)

# Upload to S3
boto3.Session().resource('s3').Bucket(bucket).Object("job_input.json").upload_file("job_input.json")

最后，您提到输入项的最大数量是500。实际上，您的输入文件最多可以有50M input items or a file size of 1GB。

赞(0）回复(0）举报 2023-05-30

我来回答

批量推理作业输入的JSON格式错误- Amazon Personalize

1条答案

相关问题

热门标签

最新问答