具有多个柱的Cumcount和Gropuby

9ceoxa92  于 2022-09-21  发布在  其他
关注(0)|答案(1)|浏览(127)

我有一个数据框:

id    value    Date
0  a       w     2022-09-30
1  a       l     2022-08-01
2  a       l     2022-07-15
3  a       w     2022-07-11
4  a       w     2022-06-12
5  a       w     2022-06-11
6  a       l     2022-06-10
7  b       l     2022-07-18
8  b       w     2022-07-14
9  b       l     2022-07-11
10 b       w     2022-07-01

当我执行df.groupby("id").cumcount(ascending=False)时,它返回:

0  6       
1  5       
2  4       
3  3       
4  2       
5  1       
6  0
7  3
8  2    
9  1    
10 0

但是,我希望为每个id分别计算w和l的个数cumcount,因此所需的输出将如下所示

id    value    Date        cumcount_w  cumcount_l
0  a       w     2022-09-30  3           3
1  a       l     2022-08-01  3           2
2  a       l     2022-07-15  3           1
3  a       w     2022-07-11  2           1
4  a       w     2022-06-12  1           1  # there is 1 w and 1 l before this date
5  a       w     2022-06-11  0           1  # there is 1 l before this date
6  a       l     2022-06-10  0           0  # both start with zero
7  b       l     2022-07-18  2           1
8  b       w     2022-07-14  1           1
9  b       l     2022-07-11  1           0
10 b       w     2022-07-01  0           0

如何使用Cumcount函数做到这一点?

l7wslrjt

l7wslrjt1#

这似乎起作用了(我刚刚处理了umcount_w):

df["is_w"] = df["value"].apply(lambda x: int(x == "w"))

df["cumsum_w"] = df.groupby("id")["is_w"].cumsum()

df_group = df.groupby("id")["cumsum_w"].max().reset_index().rename(columns={"cumsum_w":"max_count_group_w"})

df = pd.merge(left=df, right=df_group, left_on="id", right_on="id")

df["cumcount_w"] = df["max_count_group_w"] - df["cumsum_w"]

相关问题