python 我如何从每个json文件中找到所有键并将其输出为 Dataframe ?

t40tm48m  于 2023-01-19  发布在  Python
关注(0)|答案(1)|浏览(202)

假设我有“n”个json文件,前4个文件如下所示:
“路径/文件1.json”

{"keyA":"valueA",
"keyB":"valueB",
"keyC":"valueC",
"keyD":{
    "keyD_1":"valueD_1",
    "KeyD_2":"valueD_2"
    }
}

“路径/文件2.json”

{"keyB":"valueB",
"keyC":"valueC",
"keyD":{
    "KeyD_2":"valueD_2"
    }
}

“路径/文件3.json”

{"keyA":"valueA",
"keyB":"valueB",
"keyD":{
    "keyD_1":"valueD_1",
    "KeyD_2":"valueD_2"
    }
}

“路径/文件4.json”

{"keyB":"valueB",
"keyD":{
    "KeyD_1":"valueD_1"
    }
}

使用python,我想生成一个表(dataframe),其中键作为列,文件名作为行。我希望每行的键列值为“1”,只要JSON文件中有这样的键,否则为“0”。因此,对于上面的示例,我需要以下输出:

file_name   keyA    keyB    keyC    keyD    keyD_1  keyD_2
file1.json  1       1       1       1       1       1
file2.json  0       1       1       1       0       1
file3.json  1       1       0       1       1       1
file4.json  0       1       0       1       1       0
(...)       (...)   (...)   (...)   (...)   (...)   (...)

任何建议都将不胜感激。最好的,
奈杰尔

vhmi4jdf

vhmi4jdf1#

您可以将文件的内容加载到字典中(键是文件名,值是文件中的数据),然后创建 Dataframe :

files = {
    "file1.json": {
        "keyA": "valueA",
        "keyB": "valueB",
        "keyC": "valueC",
        "keyD": {"keyD_1": "valueD_1", "KeyD_2": "valueD_2"},
    },
    "file2.json": {"keyB": "valueB", "keyC": "valueC", "keyD": {"KeyD_2": "valueD_2"}},
    "file3.json": {
        "keyA": "valueA",
        "keyB": "valueB",
        "keyD": {"keyD_1": "valueD_1", "KeyD_2": "valueD_2"},
    },
    "file4.json": {"keyB": "valueB", "keyD": {"KeyD_1": "valueD_1"}},
}

all_data = []
for data in files.values():
    all_data.append({**{k: 1 for k in data}, **{k: 1 for k in data["keyD"]}})

df = pd.DataFrame(all_data, index=files.keys()).fillna(0).astype(int)
print(df)

图纸:

keyA  keyB  keyC  keyD  keyD_1  KeyD_2  KeyD_1
file1.json     1     1     1     1       1       1       0
file2.json     0     1     1     1       0       1       0
file3.json     1     1     0     1       1       1       0
file4.json     0     1     0     1       0       0       1

相关问题