在Python中合并来自点分隔键的嵌套字典并输出为JSON

67up9zun  于 2023-06-25  发布在  Python
关注(0)|答案(1)|浏览(90)

数据结构介绍及说明

我从SEM采集中提取了元数据,该元数据被构造为三个独立的字典:acquisitionMetadatadatasetMetadataimageMetadata。每个字典都包含键值对,其中键是表示分层级别的点分隔字符串。
acquisitionMetadata仅仅是如上所述的字典。
datasetMetadata是相同结构的字典列表,其中每个字典表示采集中特定数据集的元数据。imageMetadata也是字典列表,其中列表中的每个元素对应于一个数据集,并且包含表示该数据集中每个图像的元数据的另一个字典列表。

我需要做的

我需要将这三个字典组合成Python中的一个嵌套字典(最终是JSON文件),其中的键表示层次结构。例如,'acquisition.dataset.images.creationTime': '18.08.2020 17:51:07'意味着我希望将'18.08.2020 17:51:07'的值存储在acquisition{dataset{images{creationTime:18.08.2020 17:51:07}}下。

我的问题

我遇到的主要问题出现在我们讨论列表和嵌套结构时。我无法让它以我想要的方式动态构建“dataset”和“images”下的数组,要么它重复“acquisition”、“dataset”和/或“image”键,当它已经在它们下时,要么它将图像字典放在数据集数组之外。聊天机器人已经让我接近了,但无论我如何描述这个问题,它都无法正确处理。它还坚持硬编码级别名称/键,我不希望这样做。

期望的最终结果

作为参考,组合字典(和输出JSON)在使用我最小的工作示例中的变量创建时应该具有以下结构(显然不是每个键/变量都显示出来):

metadata = {
    'acquisition': {
        'genericMetadata': {
            'program': {
                'programName': 'Auto Slice & View 4',
                'programVersion': '4.2.1.1982'
            },
            'applicationId': {
                'identifierValue': 'ASV'
            },
            'fileVersion': '1.2',
            'projectName': '20200818_AlSi13 XRM tomo2',
            'numberOfCuts': '719'
        },
        'dataset': [
            {
                'rows': '1',
                'columns': '1',
                'images': [
                    {
                        'creationTime': '18.08.2020 17:51:07',
                        'stage': {
                            'workingDistance': {
                                'value': '0.00403678'
                            }
                        }
                    },
                    {
                        'creationTime': '18.08.2020 18:09:06',
                        'stage': {
                            'workingDistance': {
                                'value': '0.00403773'
                            }
                        }
                    }
                ]
            },
            {
                'rows': '1',
                'columns': '1',
                'images': [
                    {
                        'creationTime': '18.08.2020 17:51:07',
                        'stage': {
                            'workingDistance': {
                                'value': '0.00403678'
                            }
                        }
                    },
                    {
                        'creationTime': '18.08.2020 18:09:06',
                        'stage': {
                            'workingDistance': {
                                'value': '0.00403773'
                            }
                        }
                    }
                ]
            }
        ]
    }
}

最小工作示例{#mwe}

下面是一个最小的工作示例,说明了我输入到这样一个函数中的字典看起来是什么样子。您可以将其复制并粘贴到IDE中,以重新创建我正在使用的输入。

acquisition_metadata = {
'acquisition.genericMetadata.program.programName': 'Auto Slice & View 4',
 'acquisition.genericMetadata.program.programVersion': '4.2.1.1982',
 'acquisition.genericMetadata.applicationId.identifierValue': 'ASV',
 'acquisition.genericMetadata.fileVersion': '1.2',
 'acquisition.genericMetadata.projectName': '20200818_AlSi13 XRM tomo2',
 'acquisition.genericMetadata.numberOfCuts': '719',
}

dataset_metadata = [
    {
        'acquisition.dataset.rows': '1',
        'acquisition.dataset.columns': '1',
    },
    {
        'acquisition.dataset.rows': '1',
        'acquisition.dataset.columns': '1',
    },
]

image_metadata = [
    [
        {
            'acquisition.dataset.images.creationTime': '18.08.2020 17:51:07',
            'acquisition.dataset.images.stage.workingDistance.value': '0.00403678',
        },
        {
            'acquisition.dataset.images.creationTime': '18.08.2020 18:09:06',
            'acquisition.dataset.images.stage.workingDistance.value': '0.00403773',
        }
    ],
    [
        {
            'acquisition.dataset.images.creationTime': '18.08.2020 17:51:07',
            'acquisition.dataset.images.stage.workingDistance.value': '0.00403678',
        },
        {
            'acquisition.dataset.images.creationTime': '18.08.2020 18:09:06',
            'acquisition.dataset.images.stage.workingDistance.value': '0.00403773',
        }
    ]
]

我想做的事:

以下是我尝试过的(在我们的朋友“Gee Pee Tee”的帮助下):

import json
import os

def combine_metadata(acquisition_metadata, dataset_metadata, image_metadata):
    metadata = {}
    
    # Combine acquisition metadata
    for key, value in acquisition_metadata.items():
        nested_keys = key.split('.')
        current_dict = metadata
        
        for nested_key in nested_keys[:-1]:
            if nested_key not in current_dict:
                current_dict[nested_key] = {}
            current_dict = current_dict[nested_key]
        
        current_dict[nested_keys[-1]] = value
    
    # Combine dataset metadata
    metadata['acquisition']['dataset'] = []
    for dataset in dataset_metadata:
        dataset_dict = {}
        for key, value in dataset.items():
            nested_keys = key.split('.')
            current_dict = dataset_dict
            
            for nested_key in nested_keys[:-1]:
                if nested_key not in current_dict:
                    current_dict[nested_key] = {}
                current_dict = current_dict[nested_key]
            
            current_dict[nested_keys[-1]] = value
        
        metadata['acquisition']['dataset'].append(dataset_dict)
    
    # Combine image metadata
    for i, images in enumerate(image_metadata):
        metadata['acquisition']['dataset'][i]['images'] = []
        for image in images:
            image_dict = {}
            for key, value in image.items():
                nested_keys = key.split('.')
                current_dict = image_dict
                
                for nested_key in nested_keys[:-1]:
                    if nested_key not in current_dict:
                        current_dict[nested_key] = {}
                    current_dict = current_dict[nested_key]
                
                current_dict[nested_keys[-1]] = value
            
            metadata['acquisition']['dataset'][i]['images'].append(image_dict)
    
    return metadata

def save_metadata_as_json(metadata, save_path):
    filename = os.path.join(save_path, "combined.json")
    with open(filename, 'w') as file:
        json.dump(metadata, file, indent=4)
    print(f"Metadata saved as {filename}")

但它会产生这样的输出:

{
    "acquisition": {
        "genericMetadata": {
            "program": {
                "programName": "Auto Slice & View 4",
                "programVersion": "4.2.1.1982"
            },
            "applicationId": {
                "identifierValue": "ASV"
            },
            "fileVersion": "1.2",
            "projectName": "20200818_AlSi13 XRM tomo2",
            "numberOfCuts": "719"
        },
        "dataset": [
            {
                "acquisition": {
                    "dataset": {
                        "rows": "1",
                        "columns": "1"
                    }
                },
                "images": [
                    {
                        "acquisition": {
                            "dataset": {
                                "images": {
                                    "creationTime": "18.08.2020 17:51:07",
                                    "stage": {
                                        "workingDistance": {
                                            "value": "0.00403678"
                                        }
                                    }
                                }
                            }
                        }
                    },
                    {
                        "acquisition": {
                            "dataset": {
                                "images": {
                                    "creationTime": "18.08.2020 18:09:06",
                                    "stage": {
                                        "workingDistance": {
                                            "value": "0.00403773"
                                        }
                                    }
                                }
                            }
                        }
                    }
                ]
            },
            {
                "acquisition": {
                    "dataset": {
                        "rows": "1",
                        "columns": "1"
                    }
                },
                "images": [
                    {
                        "acquisition": {
                            "dataset": {
                                "images": {
                                    "creationTime": "18.08.2020 17:51:07",
                                    "stage": {
                                        "workingDistance": {
                                            "value": "0.00403678"
                                        }
                                    }
                                }
                            }
                        }
                    },
                    {
                        "acquisition": {
                            "dataset": {
                                "images": {
                                    "creationTime": "18.08.2020 18:09:06",
                                    "stage": {
                                        "workingDistance": {
                                            "value": "0.00403773"
                                        }
                                    }
                                }
                            }
                        }
                    }
                ]
            }
        ]
    }
}

在这里你可以看到我说的多余的级别名称。

总结

简而言之,我需要将上述字典输入到一个函数中,该函数创建一个如上所示的嵌套字典结构。我最终将这个组合字典作为JSON文件输出,所以如果直接到JSON输出更容易,那么我也会采用它。

zkure5ic

zkure5ic1#

你差点就成功了注意nested_keys.remove(...)

# Combine acquisition metadata
for key, value in acquisition_metadata.items():
    nested_keys = key.split('.')
    current_dict = metadata
    
    for nested_key in nested_keys[:-1]:
        if nested_key not in current_dict:
            current_dict[nested_key] = {}
        current_dict = current_dict[nested_key]
    
    current_dict[nested_keys[-1]] = value

# Combine dataset metadata
metadata['acquisition']['dataset'] = []
for dataset in dataset_metadata:
    dataset_dict = {}
    for key, value in dataset.items():
        nested_keys = key.split('.')
        nested_keys.remove('acquisition')
        nested_keys.remove('dataset')
        current_dict = dataset_dict
        
        for nested_key in nested_keys[:-1]:
            if nested_key not in current_dict:
                current_dict[nested_key] = {}
            current_dict = current_dict[nested_key]
        
        current_dict[nested_keys[-1]] = value
    
    metadata['acquisition']['dataset'].append(dataset_dict)

# Combine image metadata
for i, images in enumerate(image_metadata):
    metadata['acquisition']['dataset'][i]['images'] = []
    for image in images:
        image_dict = {}
        for key, value in image.items():
            nested_keys = key.split('.')
            nested_keys.remove('acquisition')
            nested_keys.remove('dataset')
            nested_keys.remove('images')
            current_dict = image_dict
            
            for nested_key in nested_keys[:-1]:
                if nested_key not in current_dict:
                    current_dict[nested_key] = {}
                current_dict = current_dict[nested_key]
            
            current_dict[nested_keys[-1]] = value
        
        metadata['acquisition']['dataset'][i]['images'].append(image_dict)

相关问题