result = df[["testA", "testB"]].notna().value_counts()
result.index = result.index.map({
(True, True): "Both A and B",
(True, False): "A but not B",
(False, True): "B but not A",
(False, False): "Neither A nor B"
})
结果:
Both A and B 3
A but not B 2
Neither A nor B 1
B but not A 1
a_exists = df["testA"].notna()
b_exists = df["testB"].notna()
# both
>>> (a_exists & b_exists).sum()
3
# A, but not B
>>> (a_exists & ~b_exists).sum()
2
# B, but not A
>>> (~a_exists & b_exists).sum()
1
但是可以通过一些itertools自动化:
from itertools import compress, product
cols = ["A", "B"]
for ma, mb in product([0, 1], repeat=2):
if ma == mb == 1: continue
ab_info = "".join(compress(cols, (1 - ma, 1 - mb)))
counts = ((a_exists ^ ma) & (b_exists ^ mb)).sum()
print(ab_info, counts)
3条答案
按热度按时间tp5buhyn1#
对两个测试应用
notna()
调用,然后调用value_counts
:结果:
lmyy7pcs2#
对于两列,可以分别查询每个条件:
但是可以通过一些itertools自动化:
compress
获取选定的参与方其打印
agxfikkp3#
按照fmarc对How to replace all non-NaN entries of a dataframe with 1 and all NaN with 0的回答,我们可以将 Dataframe 转换为只包含0和1。
然后,我将列'testA'中的0和1替换为'not A'和'A'。我对列' testB '重复类似的操作。
我这样做是为了简化我们的下一步,即把'testA'和'testB'中的两个字符串相加并得到它们的value_counts:
最后一行代码应该会产生你想要的结果。下面是我得到的结果:输入:
输出: