pandas 检查 Dataframe 列中的每个值是否包含来自另一个 Dataframe 列的单词

8qgya5xd 于 2022-12-21 发布在其他

关注(0)|答案(3)|浏览(224)

如何迭代一个 Dataframe 列中的每个值，并检查它是否包含另一个 Dataframe 列中的单词？

a = pd.DataFrame({'text': ['the cat jumped over the hat', 'the pope pulled on the rope', 'i lost my dog in the fog']})
b = pd.DataFrame({'dirty_words': ['cat', 'dog', 'parakeet']})

a    
    text
0   the cat jumped over the hat
1   the pope pulled on the rope
2   i lost my dog in the fog

b
    dirty_words
0   cat
1   dog
2   parakeet

我想获取仅包含以下值的新 Dataframe ：

result

0   the cat jumped over the hat
1   i lost my dog in the fog

pandas

来源：https://stackoverflow.com/questions/51353120/check-if-each-value-in-a-dataframe-column-contains-words-from-another-dataframe

3条答案

按热度按时间

zf9nrax11#

在用空格分隔字符串之后，可以使用any的列表解析，这个方法不会因为包含“cat”而包含“catheter”。

mask = [any(i in words for i in b['dirty_words'].values) \
        for words in a['text'].str.split().values]

print(a[mask])

                          text
0  the cat jumped over the hat
2     i lost my dog in the fog

赞(0）回复(0）举报 2022-12-21

bjg7j2ky2#

使用与str.contains匹配的正则表达式。

p = '|'.join(b['dirty_words'].dropna())
a[a['text'].str.contains(r'\b{}\b'.format(p))]

                          text
0  the cat jumped over the hat
2     i lost my dog in the fog

单词边界确保您不会仅仅因为“catch”包含“cat”就匹配它（感谢@DSM）。

赞(0）回复(0）举报 2022-12-21

hxzsmxv23#

我认为可以在str.split之后使用isin

a[pd.DataFrame(a.text.str.split().tolist()).isin(b.dirty_words.tolist()).any(1)]
Out[380]: 
                          text
0  the cat jumped over the hat
2     i lost my dog in the fog

赞(0）回复(0）举报 2022-12-21

我来回答

pandas 检查 Dataframe 列中的每个值是否包含来自另一个 Dataframe 列的单词

3条答案

相关问题

热门标签

最新问答