我对python还是个新手,现在才刚刚开始使用数据,我尝试着合并不同的对象,以一种更易读的方式显示数据,以便查看比较结果。
下面是我正在处理的数据:
{
"flowDefinitionArn": "arn:aws:sagemaker:us-east-1:2345:flow-definition/definition_name",
"humanAnswers": [
{
"acceptanceTime": "2022-11-15T18:37:50.085Z",
"answerContent": {
"extracted1_1": "Italy",
"extracted1_2": "Rome",
"extracted1_3": "5555",
"extracted2_1": "Czech",
"extracted2_2": "Prague",
"extracted2_3": "3333",
"reportDate": "2022-06-01T08:30",
"reportOwner": "John Smith"
},
"submissionTime": "2022-11-15T18:38:32.791Z",
"timeSpentInSeconds": 42.706,
"workerId": "1234",
"workerMetadata": {
"identityData": {
"identityProviderType": "Cognito",
"issuer": "https://cognito-idp.us-east-1.amazonaws.com/",
"sub": "c"
}
}
}
],
"humanLoopName": "test",
"inputContent": {
"document": {
"documentType": "countryReport",
"fields": [
{
"id": "reportOwner",
"type": "string",
"validation": "",
"value": "John Smith"
},
{
"id": "reportDate",
"type": "date",
"validation": "",
"value": "2022-06-01T08:30"
},
{
"id": "locationList",
"type": "table",
"value": {
"columns": [
{
"id": "country",
"type": "string"
},
{
"id": "capital",
"type": "string"
},
{
"id": "population",
"type": "number"
}
],
"rows": [
[
"UK",
"London",
1234
],
[
"France",
"Paris",
321
]
]
}
}
]
},
"document_types": [
{
"displayName": "Email",
"id": "email"
},
{
"displayName": "Invoice",
"id": "invoice"
},
{
"displayName": "Other",
"id": "other"
}
],
"input_s3_uri": "s3://my-input-bucket/file1.pdf"
}
}
我希望数据看起来像这样:
Input info: country, Original answer: UK, Human answer: extracted1_1: Italy
Input info: capital, Original answer: London, Human answer: extracted1_2: Rome
Input info: population, Original answer: 1234, Human answer: extracted1_3: 5555
Input info: country, Original answer: France, Human answer: extracted2_1: Czech
Input info: capital, Original answer: Paris, Human answer: extracted2_2: Prague
Input info: population, Original answer: 321, Human answer: extracted2_3: 3333
下面是我目前编写的代码示例:
s3_client = boto3.client('s3')
response = s3_client.get_object(Bucket=f'{config["bucket"]}', Key=f'{config["file_name"]}')
data = response['Body'].read()
d = json.loads(data)
column = d['inputContent']['document']['fields'][2]['value']['columns']
row = d['inputContent']['document']['fields'][2]['value']['rows']
answers = d['humanAnswers'][0]['answerContent']
str_row = str(row)
iter_col = iter(column)
iter_row = iter(str_row)
combined = ''
for a in answers.items():
nxt_col = next(iter_col)
for list in row:
for values in list:
v = values
combined += str(v + ", ")
print(f'Input info: {nxt_col}, Original Answer: {str_row}, Human Answer: {a}')
我现在有点困了,正在寻找一些指导,以了解如何将列(输入信息)、行(原始答案)和answerContent(人工答案)与相应的值组合在一起。
1条答案
按热度按时间kr98yfug1#
您可以尝试以下操作: