Python -如何将嵌套的json字典上移到它自己的索引中?

7uhlpewt  于 2023-02-18  发布在  Python
关注(0)|答案(2)|浏览(119)

我有一个json数据集,其中每个项目/索引可以包含2个嵌套字典。问题是这些嵌套字典中的一个包含所有的键:值对作为其父字典。换句话说,我有一个父“帐户”,每当有“子帐户”时,它会将子帐户放置在嵌套字典中,它们永远不会被视为自己独立的项目/索引。
下面是一个item/index的json示例。本质上,我需要提取sub_accounts对象并使其成为自己的索引。如您所见,它包含的所有key:value对象与包含sub_accounts的父对象相同。

{
        "classification": [
            {
                "classificationId": "Cash",
                "taxonomyId": "accounting.gp"
            }
        ],
        "id": "235",
        "kind": "Real",
        "name": "Checking",
        "sub_accounts": [
            {
                "classification": [
                    {
                        "classificationId": "Cash",
                        "taxonomyId": "accounting.gp"
                    }
                ],
                "id": "236",
                "kind": "Real",
                "name": "Cash Reserve",
                "sub_accounts": []
            }
        ]
    },

我已经能够使用json_normalize甚至.pop()的变体来完成数据的扁平化,我也尝试过探索其他扁平化选项,但在我试图完成的特定任务上没有运气。这些解决方案通常只是导致子帐户仍然与原始索引相关联。

flvlnr44

flvlnr441#

您可以使用递归函数遍历层次结构,同时逐步弹出“sub_accounts”键:

def extractAccounts(accounts):
    return [s for a in accounts 
              for s in (a,*extractAccounts(a.pop("sub_accounts",[])))]

从帐户对象列表中:

data =  [{
        "classification": [
            {
                "classificationId": "Cash",
                "taxonomyId": "accounting.gp"
            }
        ],
        "id": "235",
        "kind": "Real",
        "name": "Checking",
        "sub_accounts": [
            {
                "classification": [
                    {
                        "classificationId": "Cash",
                        "taxonomyId": "accounting.gp"
                    }
                ],
                "id": "236",
                "kind": "Real",
                "name": "Cash Reserve",
                "sub_accounts": []
            }
        ]
    }]

输出:

accounts = extractAccounts(data)
for i,account in enumerate(accounts):
    print("Account #",i)
    print(account)

Account # 0
{'classification': [{'classificationId': 'Cash', 'taxonomyId': 'accounting.gp'}], 'id': '235', 'kind': 'Real', 'name': 'Checking'}
Account # 1
{'classification': [{'classificationId': 'Cash', 'taxonomyId': 'accounting.gp'}], 'id': '236', 'kind': 'Real', 'name': 'Cash Reserve'}
  • 如果您的最高级别是单个帐户(即不是列表),只需在调用函数时将其放在列表中:x1月1x日 *
ar7v8xwq

ar7v8xwq2#

我没有一个通用的答案,但这似乎做你需要的:

raw_data = """                                                                                                                                                                                 
[                                                                                                                                                                                              
    {                                                                                                                                                                                          
        "classification": [                                                                                                                                                                    
            {                                                                                                                                                                                  
                "classificationId": "Cash",                                                                                                                                                    
                "taxonomyId": "accounting.gp"                                                                                                                                                  
            }                                                                                                                                                                                  
        ],                                                                                                                                                                                     
        "id": "235",                                                                                                                                                                           
        "kind": "Real",                                                                                                                                                                        
        "name": "Checking",                                                                                                                                                                    
        "sub_accounts": [                                                                                                                                                                      
            {                                                                                                                                                                                  
                "classification": [                                                                                                                                                            
                    {                                                                                                                                                                          
                        "classificationId": "Cash",                                                                                                                                            
                        "taxonomyId": "accounting.gp"                                                                                                                                          
                    }                                                                                                                                                                          
                ],                                                                                                                                                                             
                "id": "236",                                                                                                                                                                   
                "kind": "Real",                                                                                                                                                                
                "name": "Cash Reserve",                                                                                                                                                        
                "sub_accounts": []                                                                                                                                                             
            }                                                                                                                                                                                  
        ]                                                                                                                                                                                      
    }                                                                                                                                                                                          
]                                                                                                                                                                                              
"""                                                                                                                                                                                            
                                                                                                                                                                                               
import json                                                                                                                                                                                    
jdict = json.loads(raw_data)                                                                                                                                                                   
                                                                                                                                                                                               
empty_list = list()                                                                                                                                                                            
result     = list()                                                                                                                                                                            
for elem in jdict:                                                                                                                                                                             
    sub_elem_list = elem['sub_accounts']                                                                                                                                                       
    elem['sub_accounts'] = empty_list                                                                                                                                                          
    result.append(elem)                                                                                                                                                                        
    for sub_elem in sub_elem_list:                                                                                                                                                             
        result.append(sub_elem)                                                                                                                                                                
                                                                                                                                                                                               
print(json.dumps(result, indent=4)) 

output = """
[
    {
        "classification": [
            {
                "classificationId": "Cash",
                "taxonomyId": "accounting.gp"
            }
        ],
        "id": "235",
        "kind": "Real",
        "name": "Checking",
        "sub_accounts": []
    },
    {
        "classification": [
            {
                "classificationId": "Cash",
                "taxonomyId": "accounting.gp"
            }
        ],
        "id": "236",
        "kind": "Real",
        "name": "Cash Reserve",
        "sub_accounts": []
    }
]

"""

当你有嵌套的结构时,你需要嵌套你的循环。另一个答案是递归,如果你嵌套了超过一千个递归调用,这可能会导致问题(所以可能不是这种情况)我还假设您关心顺序,希望父ID放在第一位,而且,如果您试图从json中删除sub_accounts,那么您可能希望从记录中弹出它,但我再次假定,这种结构应予以保留。

相关问题