python 基于条件和计算创建列

0ejtzxu1  于 2022-11-21  发布在  Python
关注(0)|答案(2)|浏览(104)

下面是我的数据框架:

df = pd.DataFrame({"ID" : [1, 1, 2, 2, 2, 3, 3],
                  "length" : [0.7, 0.7, 0.8, 0.6, 0.6, 0.9, 0.9],
                  "comment" : ["typed", "handwritten", "typed", "typed", "handwritten", "handwritten", "handwritten"]})
df

    ID  length  comment
0   1   0.7     typed
1   1   0.7     handwritten
2   2   0.8     typed
3   2   0.6     typed
4   2   0.6     handwritten
5   3   0.9     handwritten
6   3   0.9     handwritten

我希望能够执行以下操作:
对于任何一组ID,如果长度相同但注解不同,则使用“键入”公式(5 x长度)计算该组ID的长度,否则使用适用于每个注解的公式计算长度。键入= 5 x长度,手写= 7 x长度。
所需输出如下:

ID  length  comment         Calculated Length
0   1   0.7     typed           5*length
1   1   0.7     handwritten     5*length
2   2   0.8     typed           5*length
3   2   0.6     typed           5*length
4   2   0.6     handwritten     7*length
5   3   0.9     handwritten     7*length
6   3   0.9     handwritten     7*length
  • 谢谢-谢谢
67up9zun

67up9zun1#

使用groupby找到满足特定条件的IDs。使用IDscomment,使用np.where计算Calculated length,如下所示

>>> grp_ids = df.groupby("ID")[["length", "comment"]].nunique()
>>> grp_ids
    length  comment
ID
1        1        2
2        2        2
3        1        1
>>> idx = grp_ids.index[(grp_ids["length"] == 1) & (grp_ids["comment"] != 1)]
>>> idx
Int64Index([1], dtype='int64', name='ID')
>>> df["Calculated length"] = np.where(
        df["ID"].isin(idx) | (df["comment"] == "typed"),
        df["length"] * 5,
        df["length"] * 7
    )
>>> df
   ID  length      comment  Calculated length
0   1     0.7        typed                3.5
1   1     0.7  handwritten                3.5
2   2     0.8        typed                4.0
3   2     0.6        typed                3.0
4   2     0.6  handwritten                4.2
5   3     0.9  handwritten                6.3
6   3     0.9  handwritten                6.3
zf9nrax1

zf9nrax12#

如果注解列只存在打字或手写,则使用np.where

import numpy as np
cond1 = df['comment'] == 'typed'
df.assign(Calculated_Length=np.where(cond1, df['length'] * 5, df['length'] * 7))

输出:

ID  length  comment     Calculated_Length
0   1   0.7     typed       3.5
1   1   0.7     handwritten 4.9
2   2   0.8     typed       4.0
3   2   0.6     typed       3.0
4   2   0.6     handwritten 4.2
5   3   0.9     handwritten 6.3
6   3   0.9     handwritten 6.3

注解后编辑

cond1 = df['comment'] == 'typed'
cond2 = df.groupby('ID')['length'].transform(lambda x: (x.max() == x.min()) & (df.loc[x.index, 'comment'].eq('typed').sum() > 0))
df.assign(Caculated_Length=np.where((cond1 | cond2), df['length']*5, df['length']*7))

输出:

ID  length  comment     Caculated_Length
0   1   0.7     typed       3.5
1   1   0.7     handwritten 3.5
2   2   0.8     typed       4.0
3   2   0.6     typed       3.0
4   2   0.6     handwritten 4.2
5   3   0.9     handwritten 6.3
6   3   0.9     handwritten 6.3

相关问题