json 将嵌套字典中的列表转换为Pandas数据框架的Python

u0njafvf  于 2023-02-10  发布在  Python
关注(0)|答案(2)|浏览(159)

我有以下字典

test = {'data': [
  {'actions': [
    {'action_type': 'link_click', 'value': '16'},
    {'action_type': 'post_engagement', 'value': '16'},
    {'action_type': 'page_engagement', 'value': '16'}],
   'spend': '13.59',
   'date_start': '2023-02-07',
   'date_stop': '2023-02-07'},
  {'actions': [
    {'action_type': 'comment', 'value': '5'},
    {'action_type': 'onsite_conversion.post_save', 'value': '1'},
    {'action_type': 'link_click', 'value': '465'},
    {'action_type': 'post', 'value': '1'},
    {'action_type': 'post_reaction', 'value': '20'},
    {'action_type': 'video_view', 'value': '4462'},
    {'action_type': 'post_engagement', 'value': '4954'},
    {'action_type': 'page_engagement', 'value': '4954'}],
   'spend': '214.71',
   'date_start': '2023-02-07',
   'date_stop': '2023-02-07'}]}

我试着把它转换成action类型之后的每个元素都是一个PandasDataFrame列,值是行。

link_click post_engagement page_engagement   spend comment onsite_conversion ...
        16              16              16   13.59     N/A               N/A
       465            4954            4954  214.71       5                 1

我知道第一个列表没有评论、帖子等,行将是N/A。我如何管理这个复杂的数据结构?

wqsoz72f

wqsoz72f1#

您可以使用类似于以下函数的内容:

# import pandas as pd

def tabulate_actions(actionsList:list, returnDf=False):
    aTbl = [{
        a['action_type']: a['value'] for a in al['actions'] # if isinstance(a, dict) and all([k in a for k in ['action_type', 'value']])
    } for al in actionsList] # if isinstance(al, dict) and isinstance(al.get('actions'), list)]     
    return pd.DataFrame(aTbl) if returnDf else aTbl
## uncomment the conditions if you're unsure of your data structure

tabulate_actions(test['data'])应返回以下字典列表:

[{'link_click': '16', 'post_engagement': '16', 'page_engagement': '16'},
 {'comment': '5',
  'onsite_conversion.post_save': '1',
  'link_click': '465',
  'post': '1',
  'post_reaction': '20',
  'video_view': '4462',
  'post_engagement': '4954',
  'page_engagement': '4954'}]

并且传递returnDf=True应该会使它返回一个DataFrame:

pbossiut

pbossiut2#

您可以尝试以下代码

import pandas as pd

# extract actions from the test data
actions = [d['actions'] for d in test['data']]

# Flatten the list of actions
flat_actions = [item for sublist in actions for item in sublist]

# Create a dataframe from the flattened list
df = pd.DataFrame(flat_actions)

# Pivot the dataframe to create columns for each action_type and have value as the row
df_pivot = df.pivot(index=None, columns='action_type', values='value')

print(df_pivot)

相关问题