我有一个pandas dataframe,其中2列是具有十进制值的嵌套列:df.tail(1).to_dict('list')
给出了这种数据
{'nested_col1': [array([{'key1': 'CO', 'key2': Decimal('8.940000000')}],
dtype=object)], 'nested_col2': [array([{'key3': 'CO', 'key4': 'P14', 'key5': Decimal('8.940000000'), 'key6': None}],
dtype=object)]}
我试图用这个来分解 Dataframe :
df = (df.drop(cols, axis=1)
.join(pd.concat(
[pd.json_normalize(df[x].explode(), errors='ignore').applymap(
lambda x: str(x) if isinstance(x, (int, float)) else x).add_prefix(f'{x}.') for x in
cols],
axis=1)))
在某些情况下,我会出现以下错误:
Traceback (most recent call last):
File "data_load.py.py", line 365, in <module>
df = prepare_data(data, transaction_id, cohort_no)
File "data_load.py.py", line 274, in prepare_data
df = flatten_dataframe(cols_to_explode, df)
File "data_load.py.py", line 204, in flatten_dataframe
df1 = pd.concat([pd.json_normalize(df[c].explode()) for c in cols],
File "data_load.py.py", line 204, in <listcomp>
df1 = pd.concat([pd.json_normalize(df[c].explode()) for c in cols],
File "/project1/venv/lib/python3.6/site-packages/pandas/io/json/_normalize.py", line 270, in _json_normalize
if any([isinstance(x, dict) for x in y.values()] for y in data):
File "/project1/venv/lib/python3.6/site-packages/pandas/io/json/_normalize.py", line 270, in <genexpr>
if any([isinstance(x, dict) for x in y.values()] for y in data):
AttributeError: 'float' object has no attribute 'values'
failed to run commands: exit status 1
我还错过了什么或者有什么更好的方法来做同样的事情吗?
预期输出应为:
nested_col1.key1,nested_col1.key2 nested_col2.key3 ... like this
2条答案
按热度按时间nxowjjhe1#
似乎每个嵌套列只有一个元素:
注意:如果每个嵌套列表中有多条记录,则可以将
[0]
替换为.explode()
。wgxvkvu92#
可以使用
json_normalize
和concat
:输出: