pandas 如何在不创建重复行的情况下拆分多个dataframe列中的列表中的多个嵌套字典?

3mpgtkmj  于 2023-05-12  发布在  其他
关注(0)|答案(1)|浏览(124)

我有一个数据框

column1                                   column2                                 column3                                   column4
0        1  [{'key1': 'value1'}, {'key2': 'value2'}]  {'key1': 'value10', 'key2': 'value11'}  [{'key1': 'value1'}, {'key2': 'value2'}]
1        2  [{'key1': 'value3'}, {'key2': 'value4'}]  {'key1': 'value12', 'key2': 'value13'}  [{'key1': 'value3'}, {'key2': 'value4'}]
2        3  [{'key1': 'value5'}, {'key2': 'value6'}]  {'key1': 'value14', 'key2': 'value15'}  [{'key1': 'value5'}, {'key2': 'value6'}]
.
.

我尝试将column2和column4拆分为列,如下所示

col1     col2_key1        col2_key2     col3                                       col4_key1    col4_key2
1          value1           value2        {'key1': 'value10', 'key2': 'value11'}    value1       value2
2          value3           value4        {'key1': 'value12', 'key2': 'value13'}   value3     value4
.
.

最初我是这样做的,创建了很多副本

dfs = {k:pd.DataFrame(x) for k, x in df.pop('column2').items()}
df = df.join(pd.concat(dfs).add_prefix('column2_').reset_index(level=1, drop=True)).reset_index(drop=True)

dfs = {k:pd.DataFrame(x) for k, x in df.pop('column4').items()}
df = df.join(pd.concat(dfs).add_prefix('column4_').reset_index(level=1, drop=True)).reset_index(drop=True)

然后我使用json_normalize给我一些奇怪的数据名称

df['column2'] = df['column2'].apply(lambda x: pd.json_normalize(x[0]) if isinstance(x[0], dict) else x[0])
df['column4'] = df['column4'].apply(lambda x: pd.json_normalize(x[0]) if isinstance(x[0], dict) else x[0])

我做错了什么?
这是我的代码

import pandas as pd

# Sample data
data = {
    'column1': [1, 2, 3],
    'column2': [[{'key1': 'value1'}, {'key2': 'value2'}],
                [{'key1': 'value3'}, {'key2': 'value4'}],
                [{'key1': 'value5'}, {'key2': 'value6'}]],
    'column3': [{'key1': 'value10', 'key2': 'value11'},
                {'key1': 'value12', 'key2': 'value13'},
                {'key1': 'value14', 'key2': 'value15'}],
    'column4': [[{'key1': 'value1'}, {'key2': 'value2'}],
                [{'key1': 'value3'}, {'key2': 'value4'}],
                [{'key1': 'value5'}, {'key2': 'value6'}]],
}

# Create DataFrame
df = pd.DataFrame(data)
print(df)

'''
dfs = {k:pd.DataFrame(x) for k, x in df.pop('column2').items()}
df = df.join(pd.concat(dfs).add_prefix('column2_').reset_index(level=1, drop=True)).reset_index(drop=True)

dfs = {k:pd.DataFrame(x) for k, x in df.pop('column4').items()}
df = df.join(pd.concat(dfs).add_prefix('column4_').reset_index(level=1, drop=True)).reset_index(drop=True)
'''
# Apply pd.json_normalize on column2 and column4
df['column2'] = df['column2'].apply(lambda x: pd.json_normalize(x[0]) if isinstance(x[0], dict) else x[0])
df['column4'] = df['column4'].apply(lambda x: pd.json_normalize(x[0]) if isinstance(x[0], dict) else x[0])

# Print the modified DataFrame
print(df)
zf9nrax1

zf9nrax11#

尝试:

df = pd.concat([df, df.pop('column2').apply(lambda x: pd.Series({f'column2_{k}': v for d in x for k, v in d.items()}))], axis=1)
df = pd.concat([df, df.pop('column4').apply(lambda x: pd.Series({f'column4_{k}': v for d in x for k, v in d.items()}))], axis=1)
print(df)

图纸:

column1                                 column3 column2_key1 column2_key2 column4_key1 column4_key2
0        1  {'key1': 'value10', 'key2': 'value11'}       value1       value2       value1       value2
1        2  {'key1': 'value12', 'key2': 'value13'}       value3       value4       value3       value4
2        3  {'key1': 'value14', 'key2': 'value15'}       value5       value6       value5       value6

相关问题