Pandas选择值位于两列之一的行

1u4esq0p  于 2023-05-15  发布在  其他
关注(0)|答案(2)|浏览(98)

我有一个像这样的数据框架

Title        Description
Area 51      Aliens come to earth on the 4th of July.
Matrix       Hacker Neo discovers the shocking truth.
Spaceballs   A star-pilot for hire and his trusty sidekick must come to the rescue of a princess.

我想选择行包含字空间或外国人在任何标题或说明。
我可以使用单个列选择包含空格的行,但不确定如何包含第二列。

words_of_interest = ["Space", "Aliens"]
   df[df["Title"].str.contains("|".join(words_of_interest))]

   Title        Description
   Area 51      Aliens come to earth on the 4th of July.
   Spaceballs   A star-pilot for hire and his trusty sidekick must come to the rescue of a
goqiplq2

goqiplq21#

您可以在两列上应用str.contains,然后使用any(axis=1)聚合布尔掩码:

words_of_interest = ["Space", "Aliens"]
pat = '|'.join(words_of_interest)
mask = df[['Title', 'Description']].apply(lambda x: x.str.contains(pat)).any(axis=1)

输出:

>>> df[mask]
        Title                                        Description
0     Area 51           Aliens come to earth on the 4th of July.
2  Spaceballs  A star-pilot for hire and his trusty sidekick ...
6bc51xsx

6bc51xsx2#

以下是其中一个选项:

m = df.stack().str.contains("|".join(words_of_interest)).unstack().any(axis=1)
​
out = df.loc[m]​

另一个使用relistcomp 的例子:

import re 

m = [bool(re.search("|".join(words_of_interest), str(v))) for v in df.to_numpy()]]

out = df.loc[m]

输出:

print(out)

        Title                                                  Description
0     Area 51                     Aliens come to earth on the 4th of July.
2  Spaceballs  A star-pilot for hire and his trusty sidekick must come ...

相关问题