从嵌套的JSON创建规范化的 Dataframe

wmtdaxz3  于 2023-08-08  发布在  其他
关注(0)|答案(2)|浏览(149)

我试图从嵌套的json文件创建一个dataframe,但遇到了麻烦。

[
    {
        "rec_id": "1",
        "user": {
            "id": "12414",
            "name": "Steve"
        },
        "addresses": [
            {
                "address_type": "Home",
                "street1": "100 Main St",
                "street2": null,
                "city": "Chicago"
            },
            {
                "address_type": "Work",
                "street1": "100 Main St",
                "street2": null,
                "city": "Chicago"
            }
        ],
        "timestamp": "2023-07-28T20:05:14.859000+00:00",
        "edited_timestamp": null
    },
    {
        "rec_id": "2",
        "user": {
            "id": "214521",
            "name": "Tim"
        },
        "addresses": [
            {
                "address_type": "Home",
                "street1": "100 Main St",
                "street2": null,
                "city": "Boston"
            },
            {
                "address_type": "Work",
                "street1": "100 Main St",
                "street2": null,
                "city": "Boston"
            }
        ],
        "timestamp": "2023-07-28T20:05:14.859000+00:00",
        "edited_timestamp": null
    },
    {
        "rec_id": "3",
        "user": {
            "id": "12121",
            "name": "Jack"
        },
        "addresses": [
            {
                "address_type": "Home",
                "street1": "100 Main St",
                "street2": null,
                "city": "Las Vegas"
            } ]
        "timestamp": "2023-07-28T20:05:14.859000+00:00",
        "edited_timestamp": null
    }
]

字符串
我试着在下面:

with open("employee.json") as file:
    data = json.load(file)  

data_df = pd.json_normalize(data)

data_df.columns.values.tolist()

['rec_id',
 'addresses',
 'timestamp',
 'edited_timestamp',
 'user.id',
 'user.name']

display(data_df)
rec_id  addresses   timestamp   edited_timestamp    user.id user.name
0   1   [{'address_type': 'Home', 'street1': '100 Main St', 'street2': None, 'city': 'Chicago'}, {'address_type': 'Work', 'street1': '100 Main St', 'street2': None, 'city': 'Chicago'}]    2023-07-28T20:05:14.859000+00:00    None    12414   Steve
1   2   [{'address_type': 'Home', 'street1': '100 Main St', 'street2': None, 'city': 'Boston'}, {'address_type': 'Work', 'street1': '100 Main St', 'street2': None, 'city': 'Boston'}]  2023-07-28T20:05:14.859000+00:00    None    214521  Tim
2   3   [{'address_type': 'Home', 'street1': '100 Main St', 'street2': None, 'city': 'Las Vegas'}]  2023-07-28T20:05:14.859000+00:00    None    12121   Jack

的数据
我如何得到输出如下-

rec_id  addresses.address_type  addresses.street1   addresses.street2   addresses.city  timestamp   edited_timestamp    user.id user.name
0   1   Home    100 Main St None    Chicago 2023-07-28T20:05:14.859000+00:00    None    12414   Steve
1   1   Work    100 Main St None    Chicago 2023-07-28T20:05:14.859000+00:00    None    12414   Steve
2   2   Home    100 Main St None    Boston  2023-07-28T20:05:14.859000+00:00    None    214521  Tim
3   2   Work    100 Main St None    Boston  2023-07-28T20:05:14.859000+00:00    None    214521  Tim
4   3   Home    100 Main St None    Las Vegas   2023-07-28T20:05:14.859000+00:00    None    12121   Jack

kzipqqlq

kzipqqlq1#

试试这个,关于.json_normalize的更多信息

df = pd.json_normalize(data, 
                       record_path='addresses', 
                       meta=['rec_id', ["user", "id"], ["user", "name"], 'timestamp', 'edited_timestamp'])

字符串
输出量:
| | 街道2|都市|接收ID| www.example.com | www.example.com |时间user.id戳|编辑user.name戳| timestamp | edited_timestamp |
| --|--|--|--|--|--|--|--|--| ------------ |
| 首页|100主街||芝加哥|一个|12414|史蒂夫|2023-07-28T20:05:14.859000+00:00|||
| 工|主街100号||芝加哥|一个|一二四一四|史蒂夫|2023-07-28T20:05:14.859000+00:00|||
| 首页|100主街||波士顿|二个|二一四五二一|提姆|2023-07-28T20:05:14.859000+00:00|||
| 工|100主街||波士顿|二个|二一四五二一|提姆|2023-07-28T20:05:14.859000+00:00|||
| 首页|主街100号||拉斯维加斯|3个|12121|杰克|2023-07-28T20:05:14.859000+00:00|||

bq3bfh9z

bq3bfh9z2#

试试看:

data_df = pd.json_normalize(
    data,
    meta=["rec_id", "timestamp", "edited_timestamp", ["user", "id"], ["user", "name"]],
    record_path=["addresses"],
    record_prefix="addresses.",
)
print(data_df)

字符串
印刷品:

addresses.address_type addresses.street1 addresses.street2 addresses.city rec_id                         timestamp edited_timestamp user.id user.name
0                   Home       100 Main St              None        Chicago      1  2023-07-28T20:05:14.859000+00:00             None   12414     Steve
1                   Work       100 Main St              None        Chicago      1  2023-07-28T20:05:14.859000+00:00             None   12414     Steve
2                   Home       100 Main St              None         Boston      2  2023-07-28T20:05:14.859000+00:00             None  214521       Tim
3                   Work       100 Main St              None         Boston      2  2023-07-28T20:05:14.859000+00:00             None  214521       Tim
4                   Home       100 Main St              None      Las Vegas      3  2023-07-28T20:05:14.859000+00:00             None   12121      Jack

相关问题