pandas 基于多个条件筛选列

ej83mcc0 于 2023-05-05 发布在其他

关注(0)|答案(4)|浏览(152)

我有一个这样的数据框

col1  col2  col3  col4
 a     a     c     c
 b     a     c     c
 c     a     c     c
 a     b     c     c
 b     b     c     c
 c     b     c     c

我有两个列表col1和col2

list1 = [['a', 'b'], ['a']]
list2 = ['a', 'b']

list1具有要从col1筛选的值，list2具有col2的值注意：两个列表的长度相同。

Filter eg : From col1, 'a' and 'b' & col2, 'a'
            From col1, 'a' & col2, 'b'

我试过这个

for i, j in zip(list1, list2):
    df = df.loc[(df['col1'].isin(i)) & (df['col2'] == j)]

这个的输出结果是空的df，但是我的预期结果是

col1  col2  col3  col4
 a     a     c     c
 b     a     c     c
 a     b     c     c

用于生成数据的代码：

import pandas as pd

df = pd.DataFrame({
    'col1': ['a', 'b', 'c', 'a', 'b', 'c'],
    'col2': ['a', 'a', 'a', 'b', 'b', 'b'],
    'col3': ['c', 'c', 'c', 'c', 'c', 'c'],
    'col4': ['c', 'c', 'c', 'c', 'c', 'c'],
})

expected_output = pd.DataFrame({
    'col1': ['a', 'b', 'a'],
    'col2': ['a', 'a', 'b'],
    'col3': ['c', 'c', 'c'],
    'col4': ['c', 'c', 'c']
})

list1 = [['a', 'b'], ['a']]
list2 = ['a', 'b']

pandas

来源：https://stackoverflow.com/questions/76156962/filter-columns-based-on-multiple-conditions

4条答案

按热度按时间

icnyk63a1#

df.loc[pd.DataFrame([df.col1.isin(x) & (df.col2 == y) for x, y in zip(list1, list2)]).any(axis=0)]

赞(0）回复(0）举报 2023-05-05

rfbsl7qr2#

IIUC，您可以尝试使用reduce连接所有掩码：

from functools import reduce

m = reduce(lambda x, y: x | y, [(df["col1"].isin(i) & (df["col2"] == j))
                                for i, j in zip(list1, list2)])

out = df.loc[m]

输出：

print(out)

  col1 col2 col3 col4
0    a    a    c    c
1    b    a    c    c
3    a    b    c    c

赞(0）回复(0）举报 2023-05-05

gkn4icbw3#

当你重复执行for cond in conditions: df = df.loc[cond]时，你实际上是在应用 * 所有 * 条件（和关系）。你的预期输出是当 * 任何 * 条件成立（或关系）时，如下所示：

pd.concat(df.loc[df['col1'].isin(i) & df['col2'].eq(j)] 
          for i,j in zip(list1,list2))

但是，您可以使用np.bitwise_or.reduce来创建or掩码，然后切片，而不是先切片再concat：

mask = np.bitwise_or.reduce([df['col1'].isin(i) & df['col2'].eq(j) 
                             for i,j in zip(list1, list2) ])
df.loc[mask]

输出

col1 col2 col3 col4
0    a    a    c    c
1    b    a    c    c
3    a    b    c    c

另一种方法是合并，但会丢失原始索引：

df.merge(pd.DataFrame({'col1': list1, 'col2':list2}).explode('col1'),
         on=['col1','col2'], how='inner')

输出（注意索引差异）：

col1 col2 col3 col4
0    a    a    c    c
1    b    a    c    c
2    a    b    c    c

赞(0）回复(0）举报 2023-05-05

oxf4rvwz4#

另一种可能的解决方案，完全基于列表理解：

df.loc[[z in [(x1, x2) for x, y in zip(list1, list2)
              for x1 in x for x2 in y] 
        for z in zip(df['col1'], df['col2'])]]

输出：

col1 col2 col3 col4
0    a    a    c    c
1    b    a    c    c
3    a    b    c    c

赞(0）回复(0）举报 2023-05-05

我来回答

pandas 基于多个条件筛选列

4条答案

相关问题

热门标签

最新问答