python 基于条件和计算创建列

0ejtzxu1 于 2022-11-21 发布在 Python

关注(0)|答案(2)|浏览(104)

下面是我的数据框架：

df = pd.DataFrame({"ID" : [1, 1, 2, 2, 2, 3, 3],
                  "length" : [0.7, 0.7, 0.8, 0.6, 0.6, 0.9, 0.9],
                  "comment" : ["typed", "handwritten", "typed", "typed", "handwritten", "handwritten", "handwritten"]})
df

    ID  length  comment
0   1   0.7     typed
1   1   0.7     handwritten
2   2   0.8     typed
3   2   0.6     typed
4   2   0.6     handwritten
5   3   0.9     handwritten
6   3   0.9     handwritten

我希望能够执行以下操作：
对于任何一组ID，如果长度相同但注解不同，则使用“键入”公式（5 x长度）计算该组ID的长度，否则使用适用于每个注解的公式计算长度。键入= 5 x长度，手写= 7 x长度。
所需输出如下：

ID  length  comment         Calculated Length
0   1   0.7     typed           5*length
1   1   0.7     handwritten     5*length
2   2   0.8     typed           5*length
3   2   0.6     typed           5*length
4   2   0.6     handwritten     7*length
5   3   0.9     handwritten     7*length
6   3   0.9     handwritten     7*length

谢谢-谢谢

python

来源：https://stackoverflow.com/questions/74505643/create-a-column-based-on-conditions-and-calculation

2条答案

按热度按时间

67up9zun1#

使用groupby找到满足特定条件的IDs。使用IDs和comment，使用np.where计算Calculated length，如下所示

>>> grp_ids = df.groupby("ID")[["length", "comment"]].nunique()
>>> grp_ids
    length  comment
ID
1        1        2
2        2        2
3        1        1
>>> idx = grp_ids.index[(grp_ids["length"] == 1) & (grp_ids["comment"] != 1)]
>>> idx
Int64Index([1], dtype='int64', name='ID')
>>> df["Calculated length"] = np.where(
        df["ID"].isin(idx) | (df["comment"] == "typed"),
        df["length"] * 5,
        df["length"] * 7
    )
>>> df
   ID  length      comment  Calculated length
0   1     0.7        typed                3.5
1   1     0.7  handwritten                3.5
2   2     0.8        typed                4.0
3   2     0.6        typed                3.0
4   2     0.6  handwritten                4.2
5   3     0.9  handwritten                6.3
6   3     0.9  handwritten                6.3

赞(0）回复(0）举报 2022-11-21

zf9nrax12#

如果注解列只存在打字或手写，则使用np.where。

import numpy as np
cond1 = df['comment'] == 'typed'
df.assign(Calculated_Length=np.where(cond1, df['length'] * 5, df['length'] * 7))

输出：

ID  length  comment     Calculated_Length
0   1   0.7     typed       3.5
1   1   0.7     handwritten 4.9
2   2   0.8     typed       4.0
3   2   0.6     typed       3.0
4   2   0.6     handwritten 4.2
5   3   0.9     handwritten 6.3
6   3   0.9     handwritten 6.3

注解后编辑

cond1 = df['comment'] == 'typed'
cond2 = df.groupby('ID')['length'].transform(lambda x: (x.max() == x.min()) & (df.loc[x.index, 'comment'].eq('typed').sum() > 0))
df.assign(Caculated_Length=np.where((cond1 | cond2), df['length']*5, df['length']*7))

输出：

ID  length  comment     Caculated_Length
0   1   0.7     typed       3.5
1   1   0.7     handwritten 3.5
2   2   0.8     typed       4.0
3   2   0.6     typed       3.0
4   2   0.6     handwritten 4.2
5   3   0.9     handwritten 6.3
6   3   0.9     handwritten 6.3

赞(0）回复(0）举报 2022-11-21

我来回答

python 基于条件和计算创建列

2条答案

相关问题

热门标签

最新问答