pandas 从字典列表到数据框

fdx2calv  于 2023-04-10  发布在  其他
关注(0)|答案(2)|浏览(96)

我有一个Python字典的字典列表,如下面的例子:

{'AU37172316199': {'B25J9/18__2016': 1}},
 {'AU41504932409': {'G06F15/00__2003': 1}},
 {'AU41687119230': {'C07K14/435__1997': 1}},
 {'AU48692741449': {'G06F15/00__2002': 1}},
 {'AU632317250': {'G16H50/00__2021': 1}},
 {'AU68078335009': {'G06F19/00__2009': 1}},
 {'BE0684625505': {'G06Q10/04__2020': 1}},
 {'BM24983R': {'G06Q50/00__2006': 1, 'G16H30/20__2007': 1}},
 {'BO00225660': {'G06V40/10__2022': 1}},
 {'BR02016440000162': {'G01R21/00__2019': 1}},
 {'BR02641663000110': {'G01R21/00__2019': 1}},
 {'BR05706282000160': {'G06F15/00__2004': 1,
   'G06F15/00__2005': 1,
   'G06F15/00__2006': 1,
   'G06F3/00__2005': 1,
   'G06K15/00__2005': 1,
   'H04N1/60__2006': 1}},
 {'BR33121512000165': {'G01R31/28__2020': 1}},
 {'CA132631008L': {'B62D1/24__2003': 1}},
 {'CA203540316L': {'H04L__2018': 1,
   'H04L25/02__2016': 1,
   'H04L25/02__2017': 1,
   'H04W24/00__2016': 1}}

我想把它转换成这样的dataframe(细化之前的数据源)。

Company result  Flag
0   AE0000392297    G06F17/10__2020 0
1   AT9030014397    B81B7/00__2016  0
2   AT9030014397    B81B7/00__2017  0
3   AU010653844 A01K67/02__2021 0
4   AU010653844 G06K9/62__2021  0
... ... ... ...
18829   ZM119870015693  G10L21/0208__2020   0
18830   ZM119870015693  G10L25/60__2022 0
18831   ZM119870015693  H04N19/117__2022    0
18832   ZM119870015693  H04N21/6587__2022   0
18833   ZM119980040172  G05B13/04__2020 0

另外,我想把字典的数字值(0或1)作为 Dataframe 的一个新列。主字典的键将被复制为内部字典中键的数量。

pn9klfpd

pn9klfpd1#

这应该做的工作:

import pandas as pd
df = (
    pd.DataFrame(data) # convert to dataframe
    .T # transpose to get the patent numbers as columns
    .reset_index() # take it out of index
    .melt(id_vars='index') # stack the variables
    .dropna() # remove the nans
    )

df["values"] = df["value"].apply(lambda x: list(zip(x.keys(),x.values()))) # get the codes

df = df.explode("values") # explode the codes (if there are multiple ones)

df = df.drop(columns=["variable","value"]) # drop the unnecessary variable column

df["code"] = df["values"].apply(lambda x: x[0]) # get the code
df["value"] = df["values"].apply(lambda x: x[1]) # get the value

df = df.drop(columns=["values"]) # drop the unnecessary values column
xwbd5t1u

xwbd5t1u2#

import pandas as pd
    # Flatten the data
    flattened_data = []
    for item in data:
        for company, results in item.items():
            for result, flag in results.items():
                flattened_data.append({'Company' : company, 'Result' : result, 'Flag' : flag})

    # Create a DataFrame from the flattened data
    df = pd.DataFrame(flattened_data)

    # Print the resulting DataFrame
    print(df)

上面的代码将创建一个包含三列的DataFrame:Company、Result和Flag。Company列包含来自主字典的键,Result列包含来自内部字典的键,Flag列包含来自内部字典的值。

相关问题