我有一个数据框
column1 column2 column3 column4
0 1 [{'key1': 'value1'}, {'key2': 'value2'}] {'key1': 'value10', 'key2': 'value11'} [{'key1': 'value1'}, {'key2': 'value2'}]
1 2 [{'key1': 'value3'}, {'key2': 'value4'}] {'key1': 'value12', 'key2': 'value13'} [{'key1': 'value3'}, {'key2': 'value4'}]
2 3 [{'key1': 'value5'}, {'key2': 'value6'}] {'key1': 'value14', 'key2': 'value15'} [{'key1': 'value5'}, {'key2': 'value6'}]
.
.
我尝试将column2和column4拆分为列,如下所示
col1 col2_key1 col2_key2 col3 col4_key1 col4_key2
1 value1 value2 {'key1': 'value10', 'key2': 'value11'} value1 value2
2 value3 value4 {'key1': 'value12', 'key2': 'value13'} value3 value4
.
.
最初我是这样做的,创建了很多副本
dfs = {k:pd.DataFrame(x) for k, x in df.pop('column2').items()}
df = df.join(pd.concat(dfs).add_prefix('column2_').reset_index(level=1, drop=True)).reset_index(drop=True)
dfs = {k:pd.DataFrame(x) for k, x in df.pop('column4').items()}
df = df.join(pd.concat(dfs).add_prefix('column4_').reset_index(level=1, drop=True)).reset_index(drop=True)
然后我使用json_normalize
给我一些奇怪的数据名称
df['column2'] = df['column2'].apply(lambda x: pd.json_normalize(x[0]) if isinstance(x[0], dict) else x[0])
df['column4'] = df['column4'].apply(lambda x: pd.json_normalize(x[0]) if isinstance(x[0], dict) else x[0])
我做错了什么?
这是我的代码
import pandas as pd
# Sample data
data = {
'column1': [1, 2, 3],
'column2': [[{'key1': 'value1'}, {'key2': 'value2'}],
[{'key1': 'value3'}, {'key2': 'value4'}],
[{'key1': 'value5'}, {'key2': 'value6'}]],
'column3': [{'key1': 'value10', 'key2': 'value11'},
{'key1': 'value12', 'key2': 'value13'},
{'key1': 'value14', 'key2': 'value15'}],
'column4': [[{'key1': 'value1'}, {'key2': 'value2'}],
[{'key1': 'value3'}, {'key2': 'value4'}],
[{'key1': 'value5'}, {'key2': 'value6'}]],
}
# Create DataFrame
df = pd.DataFrame(data)
print(df)
'''
dfs = {k:pd.DataFrame(x) for k, x in df.pop('column2').items()}
df = df.join(pd.concat(dfs).add_prefix('column2_').reset_index(level=1, drop=True)).reset_index(drop=True)
dfs = {k:pd.DataFrame(x) for k, x in df.pop('column4').items()}
df = df.join(pd.concat(dfs).add_prefix('column4_').reset_index(level=1, drop=True)).reset_index(drop=True)
'''
# Apply pd.json_normalize on column2 and column4
df['column2'] = df['column2'].apply(lambda x: pd.json_normalize(x[0]) if isinstance(x[0], dict) else x[0])
df['column4'] = df['column4'].apply(lambda x: pd.json_normalize(x[0]) if isinstance(x[0], dict) else x[0])
# Print the modified DataFrame
print(df)
1条答案
按热度按时间zf9nrax11#
尝试:
图纸: