python-3.x 提取 Dataframe 列中列表列表的值

ig9co6j1  于 2022-12-15  发布在  Python
关注(0)|答案(2)|浏览(160)

我有以下数据集:

df = pd.DataFrame({'sentence':["sentence1", "sentence2"],
'Parent': [[['x', 'HackOrg'], ['xx', 'Purpose'], ['xxx', 'Area'], ['xxxx', 'HackOrg']], [['xxxxx', 'Exp'], ['xxxxxx', 'Idus'], ['xxxxxxx', 'Area'], ['xxxxxxxx', 'Area']]]
})

sentence    Parent
0   sentence1   [[x, HackOrg], [xx, Purpose], [xxx, Area], [xx...
1   sentence2   [[xxxxx, Exp], [xxxxxx, Idus], [xxxxxxx, Area]...

我想得到下面的输出

sentence   HackOrg  Purpose    Area    HackOrg  Exp    Idus   Area     Area
o  sentence1  x        xx         xxx     xxxx 
1  sentence2                                       xxxxx xxxxxx  xxxxxxx xxxxxxxx

有什么想法吗?

iezvtpos

iezvtpos1#

试试看:

from itertools import count

c = count()
df.Parent = df.Parent.apply(lambda x: {f"{b}.{next(c)}": a for a, b in x})
df = pd.concat([df, df.pop("Parent").apply(pd.Series)], axis=1).fillna("")

df.columns = df.columns.str.replace(r"\.\d+$", "", regex=True)

print(df)

打印:

sentence HackOrg Purpose Area HackOrg    Exp    Idus     Area      Area
0  sentence1       x      xx  xxx    xxxx                                  
1  sentence2                               xxxxx  xxxxxx  xxxxxxx  xxxxxxxx
zed5wv10

zed5wv102#

使用整形:

s = df['Parent'].explode()

out = (pd
 .DataFrame(s.tolist(), index=s.index)
 .reset_index().reset_index()
 .pivot_table(index='index', columns=['level_0', 1], values=0, aggfunc='first')
 .droplevel('level_0', axis=1).rename_axis(index=None, columns=None)
)

输出:

HackOrg Purpose Area HackOrg    Exp    Idus     Area      Area
0       x      xx  xxx    xxxx    NaN     NaN      NaN       NaN
1     NaN     NaN  NaN     NaN  xxxxx  xxxxxx  xxxxxxx  xxxxxxxx

相关问题