pandas Python：计算组合 Dataframe 的数量

ygya80vv 于 2023-01-15 发布在 Python

关注(0)|答案(3)|浏览(131)

计算两列中组合的数量的简单方法是什么？给定以下数据框：

df =
id testA testB
1  3     NA
1  1     3
2  2     NA
2  NA    1
2  0     0
3  NA    NA
3  1     1

我想知道独立于分数的不同组合。例如：

Both tests: 3
A but not B: 2
B but not A: 1

pandas

来源：https://stackoverflow.com/questions/75085393/python-counting-the-number-of-combinations-dataframe

3条答案

按热度按时间

tp5buhyn1#

对两个测试应用notna()调用，然后调用value_counts：

result = df[["testA", "testB"]].notna().value_counts()
result.index = result.index.map({
    (True, True): "Both A and B",
    (True, False): "A but not B",
    (False, True): "B but not A",
    (False, False): "Neither A nor B"
})

结果：

Both A and B       3
A but not B        2
Neither A nor B    1
B but not A        1

赞(0）回复(0）举报 2023-01-15

lmyy7pcs2#

对于两列，可以分别查询每个条件：

a_exists = df["testA"].notna()
b_exists = df["testB"].notna()

# both
>>> (a_exists & b_exists).sum()
3

# A, but not B
>>> (a_exists & ~b_exists).sum()
2

# B, but not A
>>> (~a_exists & b_exists).sum()
1

但是可以通过一些itertools自动化：

from itertools import compress, product

cols = ["A", "B"]
for ma, mb in product([0, 1], repeat=2):
    if ma == mb == 1: continue
    ab_info =  "".join(compress(cols, (1 - ma, 1 - mb)))
    counts  = ((a_exists ^ ma) & (b_exists ^ mb)).sum()

    print(ab_info, counts)

获取[0，1] x 2上的"选择器"
如果两者都是1，即两者都不存在，则跳过选择器
否则
使用compress获取选定的参与方
通过逐位XOR和AND结果查看是否求反
求和以获得总计数

其打印

AB 3
A 2
B 1

赞(0）回复(0）举报 2023-01-15

agxfikkp3#

按照fmarc对How to replace all non-NaN entries of a dataframe with 1 and all NaN with 0的回答，我们可以将 Dataframe 转换为只包含0和1。

df = df.notnull().astype('int')

然后，我将列'testA'中的0和1替换为'not A'和'A'。我对列' testB '重复类似的操作。

df['testA'].replace(1, 'A', inplace=True)
    df['testA'].replace(0, 'not A', inplace=True)
    df['testB'].replace(1, 'B', inplace=True)
    df['testB'].replace(0, 'not B', inplace=True)

我这样做是为了简化我们的下一步，即把'testA'和'testB'中的两个字符串相加并得到它们的value_counts：

df['sum'] = df['testA'] + ' + ' +newdf['testB']
    df['sum'].value_counts()

最后一行代码应该会产生你想要的结果。下面是我得到的结果：输入：

id   testA   testB
0   1   3.0 NaN
1   1   1.0 3.0
2   2   2.0 NaN
3   2   NaN 1.0
4   2   0.0 0.0
5   3   NaN NaN
6   3   1.0 1.0

输出：

A + B            3
A + not B        2
not A + B        1
not A + not B    1
Name: sum, dtype: int64

赞(0）回复(0）举报 2023-01-15

我来回答

pandas Python：计算组合 Dataframe 的数量

3条答案

相关问题

热门标签

最新问答