计算得分变化并添加到新列pandas

v6ylcynt  于 2023-04-18  发布在  其他
关注(0)|答案(5)|浏览(99)

我希望计算两个时间点之间每个人的心理健康评分的变化。
每个用户都有一个名字,以及3个不同时间点的心理健康分数。我想计算一下时间点3和1之间心理健康分数的变化
下面是我开始使用的df示例:

User   Timepoint   Mental Health Score
Bill       1               5
Bill       2               10
Bill       3               15
Wiz        1               10
Wiz        2               10
Wiz        3               15
Sam        1               5
Sam        2               5
Sam        3               5

这是所需输出:

User   Timepoint   Mental Health Score      Change in Mental Health (TP1 and 3) 
Bill       1               5                               
Bill       2               10
Bill       3               15                              10
Wiz        1               10
Wiz        2               10
Wiz        3               15                              5
Sam        1               5
Sam        2               5
Sam        3               5                               0

有人知道怎么做吗?

m1m5dgzv

m1m5dgzv1#

您可以使用shift()np.where()完成此操作

df['Change in Mental Health (TP1 and 3)'] = df['Mental Health Score'] - df['Mental Health Score'].shift(2)
df['Change in Mental Health (TP1 and 3)'] = np.where(df['Timepoint'] != 3, 0, df['Change in Mental Health (TP1 and 3)']).astype(int)
df
sqxo8psd

sqxo8psd2#

groupbywhere试试:

#sort by Timepoint if needed
#df = df.sort_values("Timepoint")

changes = df.groupby("User")["Mental Health Score"].transform('last')-df.groupby("User")["Mental Health Score"].transform('first')
df["Change"] = changes.where(df["Timepoint"].eq(3))

>>> df
   User  Timepoint  Mental Health Score  Change
0  Bill          1                    5     NaN
1  Bill          2                   10     NaN
2  Bill          3                   15    10.0
3   Wiz          1                   10     NaN
4   Wiz          2                   10     NaN
5   Wiz          3                   15     5.0
6   Sam          1                    5     NaN
7   Sam          2                    5     NaN
8   Sam          3                    5     0.0
woobm2wo

woobm2wo3#

正如在注解中已经指出的,您可以在Usergroupby您的 Dataframe ,并在Mental Health Score上计算差异
我在这里放了一段代码来演示

def _overall_change(scores):
  return scores.iloc[-1] - scores.iloc[0]

person = df.groupby('User')['Score'].agg(_overall_change)
h9vpoimq

h9vpoimq4#

使用groupbymerge

g = df.sort_values(by='Timepoint').groupby('User')['Mental Health Score']

s = pd.concat({3: g.last()-g.first()})
#    User
# 3  Bill    10
#    Sam      0
#    Wiz      5
# Name: Mental Health Score, dtype: int64

df.merge(s, left_on=['Timepoint', 'User'], right_index=True, how='left')

输出:

User  Timepoint  Mental Health Score_x  Mental Health Score_y
0  Bill          1                      5                    NaN
1  Bill          2                     10                    NaN
2  Bill          3                     15                   10.0
3   Wiz          1                     10                    NaN
4   Wiz          2                     10                    NaN
5   Wiz          3                     15                    5.0
6   Sam          1                      5                    NaN
7   Sam          2                      5                    NaN
8   Sam          3                      5                    0.0
tez616oj

tez616oj5#

这是另一个可能的解决方案:

import pandas as pd

def calculate_change(mhs):
    mhs = list(mhs)
    return mhs[-1] - mhs[0]

df = df.sort_values(["User", "Timepoint"])
diff = df.groupby('User')['Mental Health Score'].agg(calculate_change)

df = pd.merge(df, diff, how='left', left_on='User', right_index=True)
df.columns = ['User', 'Timepoint', 'Mental Health Score', 'Change']
df['Change'] = df['Change'].loc[df['Timepoint']==3]
print(df)

输出

User  Timepoint  Mental Health Score  Change
0  Bill          1                    5     NaN
1  Bill          2                   10     NaN
2  Bill          3                   15    10.0
3   Wiz          1                   10     NaN
4   Wiz          2                   10     NaN
5   Wiz          3                   15     5.0
6   Sam          1                    5     NaN
7   Sam          2                    5     NaN
8   Sam          3                    5     0.0

相关问题