Python Pandas -从单元格中删除单词

yqyhoc1h  于 2023-02-02  发布在  Python
关注(0)|答案(2)|浏览(115)

我有包含名称的列。我希望删除名称和;其中在名称后标记为(Retired)或(Retired)。但问题是,它不会以相同的格式显示。有时单元格有多个名称,其中一个名称会被重试。在另一种情况下,单元格的名字后面是Retired,然后是姓氏。
Dataframe = df
示例列值-当前状态

Owner Name
George (Georgy) (Retired) Clooney
Meghan (retired) Markle
Harry Porter (Retired)
Hermione Granger; Harry Porter (Retired)
Ginny Weasley; Ron Weasley; Harry Porter (retired); Luna Lovegood

示例列值-未来状态

Owner Name
Null
Null
Null
Hermione Granger
Ginny Weasley; Ron Weasley; Luna Lovegood

我想用“”替换,但不起作用。请。我将不胜感激任何指示。

ao218c7q

ao218c7q1#

split,筛选,再次与groupby.agg连接:

df['Owner Name'] = (df['Owner Name']
 .str.split(';\s*', expand=True).stack()
 .loc[lambda s: ~s.str.contains('\(Retired\)', case=False)]
 .groupby(level=0).agg('; '.join)
)

输出:

Owner Name
0                                        NaN
1                                        NaN
2                                        NaN
3                           Hermione Granger
4  Ginny Weasley; Ron Weasley; Luna Lovegood
vwkv1x7d

vwkv1x7d2#

使用单个正则表达式替换:

df['Owner Name'] = df['Owner Name'].str.replace(r'[^;]*\(retired\)[^;]*;?', "", regex=True, case=False)\
    .str.strip(';').replace("", np.nan)
Owner Name
0                                        NaN
1                                        NaN
2                                        NaN
3                           Hermione Granger
4  Ginny Weasley; Ron Weasley; Luna Lovegood

执行时间比较(仅针对用例):

In [364]: %timeit df['Owner Name'].str.replace(r'[^;]*\(retired\)[^;]*;?', "", regex=True, case=False).str.strip(';'
     ...: ).replace("", np.nan)
322 µs ± 1.44 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [365]: %timeit df['Owner Name'].str.split(';\s*', expand=True).stack().loc[lambda s: ~s.str.contains('\(Retired\)
     ...: ', case=False)].groupby(level=0).agg('; '.join)
1.19 ms ± 8.92 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

相关问题