python 迭代要合并的数据

weylhg0b  于 2022-12-02  发布在  Python
关注(0)|答案(1)|浏览(118)

我对python还是个新手,现在才刚刚开始使用数据,我尝试着合并不同的对象,以一种更易读的方式显示数据,以便查看比较结果。
下面是我正在处理的数据:

{
"flowDefinitionArn": "arn:aws:sagemaker:us-east-1:2345:flow-definition/definition_name",
"humanAnswers": [
    {
        "acceptanceTime": "2022-11-15T18:37:50.085Z",
        "answerContent": {
            "extracted1_1": "Italy",
            "extracted1_2": "Rome",
            "extracted1_3": "5555",
            "extracted2_1": "Czech",
            "extracted2_2": "Prague",
            "extracted2_3": "3333",
            "reportDate": "2022-06-01T08:30",
            "reportOwner": "John Smith"
        },
        "submissionTime": "2022-11-15T18:38:32.791Z",
        "timeSpentInSeconds": 42.706,
        "workerId": "1234",
        "workerMetadata": {
            "identityData": {
                "identityProviderType": "Cognito",
                "issuer": "https://cognito-idp.us-east-1.amazonaws.com/",
                "sub": "c"
            }
        }
    }
],
"humanLoopName": "test",
"inputContent": {
    "document": {
        "documentType": "countryReport",
        "fields": [
            {
                "id": "reportOwner",
                "type": "string",
                "validation": "",
                "value": "John Smith"
            },
            {
                "id": "reportDate",
                "type": "date",
                "validation": "",
                "value": "2022-06-01T08:30"
            },
            {
                "id": "locationList",
                "type": "table",
                "value": {
                    "columns": [
                        {
                            "id": "country",
                            "type": "string"
                        },
                        {
                            "id": "capital",
                            "type": "string"
                        },
                        {
                            "id": "population",
                            "type": "number"
                        }
                    ],
                    "rows": [
                        [
                            "UK",
                            "London",
                            1234
                        ],
                        [
                            "France",
                            "Paris",
                            321
                        ]
                    ]
                }
            }
        ]
    },
    "document_types": [
        {
            "displayName": "Email",
            "id": "email"
        },
        {
            "displayName": "Invoice",
            "id": "invoice"
        },
        {
            "displayName": "Other",
            "id": "other"
        }
    ],
    "input_s3_uri": "s3://my-input-bucket/file1.pdf"
}

}
我希望数据看起来像这样:

Input info: country, Original answer: UK, Human answer: extracted1_1: Italy

Input info: capital, Original answer: London, Human answer: extracted1_2: Rome

Input info: population, Original answer: 1234, Human answer: extracted1_3: 5555

Input info: country, Original answer: France, Human answer: extracted2_1: Czech

Input info: capital, Original answer: Paris, Human answer: extracted2_2: Prague

Input info: population, Original answer: 321, Human answer: extracted2_3: 3333

下面是我目前编写的代码示例:

s3_client       = boto3.client('s3')
response        = s3_client.get_object(Bucket=f'{config["bucket"]}', Key=f'{config["file_name"]}')
data            = response['Body'].read()
d               = json.loads(data)
column          = d['inputContent']['document']['fields'][2]['value']['columns']
row             = d['inputContent']['document']['fields'][2]['value']['rows']
answers         = d['humanAnswers'][0]['answerContent']
str_row         = str(row)
iter_col        = iter(column)
iter_row        = iter(str_row)
combined        = ''

for a in answers.items():
    nxt_col = next(iter_col)
    for list in row:
        for values in list:
            v = values
            combined += str(v + ", ")

print(f'Input info: {nxt_col}, Original Answer: {str_row}, Human Answer: {a}')

我现在有点困了,正在寻找一些指导,以了解如何将列(输入信息)、行(原始答案)和answerContent(人工答案)与相应的值组合在一起。

kr98yfug

kr98yfug1#

您可以尝试以下操作:

d = json.loads(data)
cols=[i['id'] for i in d['inputContent']['document']['fields'][2]['value']['columns']] # ['country', 'capital', 'population']

extracted=d['humanAnswers'][0]['answerContent']
extracted_vals=list(dict(filter(lambda e:e[0].startswith('extra'), extracted.items())).values()) 
# output -- > ['Italy', 'Rome', '5555', 'Czech', 'Prague', '3333']

datacol_rows =[i for i in d['inputContent']['document']['fields'][2]['value']['rows']]
datacol_rows = [item for sublist in datacol_rows for item in sublist]
# output -- > ['UK', 'London', 1234, 'France', 'Paris', 321]

final=pd.DataFrame({k: extracted_vals[i::3] for i, k in enumerate(['extracted_' + i for i in cols])})
'''
    extracted_country   extracted_capital   extracted_population
0   Italy                    Rome                 5555
1   Czech                    Prague               3333

'''
final2=pd.DataFrame({k: datacol_rows[i::3] for i, k in enumerate(cols)})
'''
    country capital population
0   UK      London  1234
1   France  Paris   321

'''
final=final.join(final2)
final=final[['country','extracted_country','capital','extracted_capital','population','extracted_population']]
print(final)
'''
|    | country   | extracted_country   | capital   | extracted_capital   |   population |   extracted_population |
|---:|:----------|:--------------------|:----------|:--------------------|-------------:|-----------------------:|
|  0 | UK        | Italy               | London    | Rome                |         1234 |                   5555 |
|  1 | France    | Czech               | Paris     | Prague              |          321 |                   3333 |
'''

相关问题