将列表中的列表转换为Python Pandas数据框架

jgwigjjp  于 2023-02-02  发布在  Python
关注(0)|答案(5)|浏览(135)

我有以下列表

test={'data': [{'name': 'john',
   'insights': {'data': [{'id': '123',
      'person_id': '456',
      'date_start': '2022-12-31',
      'date_stop': '2023-01-29',
      'impressions': '4070',
      'spend': '36.14'}],
    'paging': {'cursors': {'before': 'MAZDZD', 'after': 'MAZDZD'}}},
   'id': '978'}]}

我想创建一个Pandas数据框,其中的列是name、date_start、date_stop、impressions和spend。
我试过了,

data = pd.DataFrame()
data = data.append(test['data'])

但是现在的洞察力变成了这样的一个专栏

name    insights                                             id
john    {'data': [{'id': '123', 'person_id': '456', 'd...   978

我如何从洞察力专栏中获得印象和支出?当我尝试

test['data']['insights']

出现错误

list indices must be integers or slices, not str
jfgube3f

jfgube3f1#

使用pd.json_normalize

>>> pd.json_normalize(test["data"], ['insights', 'data'])

    id person_id  date_start   date_stop impressions  spend
0  123       456  2022-12-31  2023-01-29        4070  36.14
ql3eal8s

ql3eal8s2#

一个选项是将pandas.json_normalizepandas.Series.explode一起使用:

df = (
        pd.json_normalize(test["data"])
            ['insights.data']
                .explode()
                .pipe(lambda s: pd.DataFrame(s.tolist()))
      )

输出:

print(df)

    id person_id  date_start   date_stop impressions  spend
0  123       456  2022-12-31  2023-01-29        4070  36.14
nkkqxpd9

nkkqxpd93#

试试看:

import pandas as pd

test = {
    "data": [
        {
            "name": "john",
            "insights": {
                "data": [
                    {
                        "id": "123",
                        "person_id": "456",
                        "date_start": "2022-12-31",
                        "date_stop": "2023-01-29",
                        "impressions": "4070",
                        "spend": "36.14",
                    }
                ],
                "paging": {"cursors": {"before": "MAZDZD", "after": "MAZDZD"}},
            },
            "id": "978",
        }
    ]
}

df = pd.DataFrame(
    [
        {
            "name": d["name"],
            "date_start": dd["date_start"],
            "date_stop": dd["date_stop"],
            "impressions": dd["impressions"],
        }
        for d in test["data"]
        for dd in d["insights"]["data"]
    ]
)
print(df)

图纸:

name  date_start   date_stop impressions
0  john  2022-12-31  2023-01-29        4070
zbsbpyhn

zbsbpyhn4#

下面是一个替代方法:

df = pd.DataFrame(test["data"][0]["insights"]["data"], 
        columns=["name", "date_start", "date_stop", "impressions", "spend"])
df["name"] = test["data"][0]["name"]
print(df)
name  date_start   date_stop impressions  spend
0  john  2022-12-31  2023-01-29        4070  36.14
qvtsj1bj

qvtsj1bj5#

严重混乱的 Dataframe

data['insights'][0]['data'][0]['impressions']
data['insights'][0]['data'][0]['spend']

相关问题