将lambda函数应用于Pandas groupby的快速方法

gcxthw6b 于 2023-01-15 发布在其他

关注(0)|答案(1)|浏览(126)

我有一个Pandas Dataframe ，它是一个小 Dataframe 的（大量的）重复，但是只有一列是不重复的。我想应用一个函数，这个函数可以作用于这个不重复的列和其中一个重复的列。但是整个过程很慢，我需要一个替代的方法来更快地工作。下面是一个最小的例子：

import pandas as pd
import numpy as np
import random

repeating_times = 4
df = pd.DataFrame({"col1": [1, 2, 3, 4, 5]*repeating_times,
                   "col2": ['a', 'b', 'c', 'd', 'e']*repeating_times,
                   "true": ['P', 'P', 'N', 'P', 'N']*repeating_times,
                   "pred": random.choices(["P", "N"], k=5*repeating_times)})

grps = df.groupby(by=["col1", "col2"])
true_pos = grps.apply(lambda gr: np.sum(gr[gr['pred'] == 'P']["true"] == 'P'))
true_pos

true_pos测量（col1，col2）的所有组的真阳性样本（其中预测值和真值为正类）。

**更新：**一个更好的方法是使用agg而不是应用函数。

repeating_times = 4
df = pd.DataFrame({"col1": [1, 2, 3, 4, 5]*repeating_times,
                   "col2": ['a', 'b', 'c', 'd', 'e']*repeating_times,
                   "true": ['P', 'P', 'N', 'P', 'N']*repeating_times,
                   "pred": random.choices(["P", "N"], k=5*repeating_times)})

df["true_pos"] = (df["true"]=="P") & (df["pred"]=="P")

true_pos = df.groupby(["col1", "col2"]).agg({"true_pos": "sum"})

pandas

来源：https://stackoverflow.com/questions/75089555/faster-way-to-apply-lambda-function-to-pandas-groupby

1条答案

按热度按时间

py49o6xq1#

在这些情况下，您可以从另一个Angular 进行处理：首先计算内部条件，即，“真”和“预测”都是“P”，然后通过col_1和col_2对 that 分组，并求和：

>>> (df["true"].eq("P") & df["pred"].eq("P")).groupby([df["col1"], df["col2"]]).sum()

col1  col2
1     a       4
2     b       2
3     c       0
4     d       0
5     e       0
dtype: int64

这是被分组的东西：

>>> (df["true"].eq("P") & df["pred"].eq("P"))

0      True
1     False
2     False
3     False
4     False
5      True
6      True
7     False
8     False
9     False
10     True
11     True
12    False
13    False
14    False
15     True
16    False
17    False
18    False
19    False
dtype: bool

然后，.groupby将查看唯一的col1和col2对对应到其中的位置，并对它们中的每一个求和。

赞(0）回复(0）举报 2023-01-15

我来回答

将lambda函数应用于Pandas groupby的快速方法

1条答案

相关问题

热门标签

最新问答