基于列中子字符串的存在过滤Pandas Dataframe

ckx4rj1h  于 2023-02-14  发布在  其他
关注(0)|答案(4)|浏览(108)

不确定这是一个“用Pandas过滤”的问题还是一个文本分析问题,但是:
给定df,

d = {
    "item": ["a", "b", "c", "d"],
    "report": [
        "john rode the subway through new york",
        "sally says she no longer wanted any fish, but",
        "was not submitted",
        "the doctor proceeded to call washington and new york",
    ],
}
df = pd.DataFrame(data=d)
df

导致

item, report
a, "john rode the subway through new york"
b, "sally says she no longer wanted any fish, but"
c, "was not submitted"
d, "the doctor proceeded to call washington and new york"

和要匹配的术语列表:

terms = ["new york", "fish"]

根据terms中的子字符串是否在report列中找到,如何减少df以包含以下行,从而保留item

item, report
a, "john rode the subway through new york"
b, "sally says she no longer wanted any fish, but"
d, "the doctor proceeded to call washington and new york"
oogrdqng

oogrdqng1#

另一种可能的解决方案基于numpy

strings = np.array(df['report'], dtype=str)
substrings = np.array(terms)

index = np.char.find(strings[:, None], substrings)
mask = (index >= 0).any(axis=1)

df.loc[mask]

输出:

item                                             report
0    a              john rode the subway through new york
1    b      sally says she no longer wanted any fish, but
3    d  the doctor proceeded to call washington and ne...
s6fujrry

s6fujrry2#

从另一个答案here中提取:
您可以将terms更改为regex可用的单个字符串(即以|分隔),然后使用df.Series.str.contains

term_str = '|'.join(terms) # makes a string of 'new york|fish'
df[df['report'].str.contains(term_str)]
fumotvh3

fumotvh33#

试试这个:
在正则表达式中使用单词边界将确保"fish"匹配,但"fishy"不匹配(作为示例)

m = df['report'].str.contains(r'\b({})\b'.format(r'|'.join(terms)))

df2 = df.loc[m]

输出:

item                                             report
0    a              john rode the subway through new york
1    b      sally says she no longer wanted any fish, but
3    d  the doctor proceeded to call washington and ne...
2uluyalo

2uluyalo4#

试试这个:

df[df['report'].apply(lambda x: any(term in x for term in terms))]

输出:

item                                             report
0    a              john rode the subway through new york
1    b      sally says she no longer wanted any fish, but
3    d  the doctor proceeded to call washington and ne...

相关问题