django 使用panda数据框中的值计算偏差%

5gfr0r5j  于 2022-11-26  发布在  Go
关注(0)|答案(1)|浏览(126)

我对python还比较陌生,我有以下 Dataframe

setting_id  subject_id  seconds  result_id  owner_id  average  duration_id
0            7           1        0     1680.5       2.0   24.000          1.0
1            7           1     3600     1690.5       2.0   46.000          2.0
2            7           1    10800     1700.5       2.0  101.000          4.0
3            7           2        0     1682.5       2.0   12.500          1.0
4            7           2     3600     1692.5       2.0   33.500          2.0
5            7           2    10800     1702.5       2.0   86.500          4.0
6            7           3        0     1684.5       2.0    8.500          1.0
7            7           3     3600     1694.5       2.0   15.000          2.0
8            7           3    10800     1704.5       2.0   34.000          4.0

我需要做的是**计算“秒”值不等于0的平均值与秒值为零的平均值的偏差(%),其中subject_id和Setting_id相同
setting_id ==7 & subject_id ==1将是:
(result/baseline)*100
------〉持续3600秒:(46/24)*100 = +192%
------〉持续10800秒:(101/24)*100 = +421%
....基线=秒值为0的平均结果
.... result =秒值不为0的平均结果
生成的df应如下所示

setting_id  subject_id  seconds  owner_id  average  deviation  duration_id
0           7           1        0         2       24          0            1
1           7           1     3600         2       46        192            2
2           7           1    10800         2      101        421            4

我想使用这些计算,然后绘制与基线偏差的回归图(使用海运)
我已经用这个df玩了2天了,尝试了不同的forloop,但是我就是找不到正确的方法。

ru9i0ody

ru9i0ody1#

您可以用途:

# identify rows with 0
m = df['seconds'].eq(0)
# compute the sum of rows with 0
s = (df['average'].where(m)
     .groupby([df['setting_id'], df['subject_id']])
     .sum()
    )

# compute the deviation per group
deviation = (
 df[['setting_id', 'subject_id']]
 .merge(s, left_on=['setting_id', 'subject_id'], right_index=True, how='left')
 ['average']
 .rdiv(df['average']).mul(100)
 .round().astype(int) # optional
 .mask(m, 0)
)

df['deviation'] = deviation
# or
# out = df.assign(deviation=deviation)

输出量:

setting_id  subject_id  seconds  result_id  owner_id  average  duration_id  deviation
0           7           1        0     1680.5       2.0     24.0          1.0          0
1           7           1     3600     1690.5       2.0     46.0          2.0        192
2           7           1    10800     1700.5       2.0    101.0          4.0        421
3           7           2        0     1682.5       2.0     12.5          1.0          0
4           7           2     3600     1692.5       2.0     33.5          2.0        268
5           7           2    10800     1702.5       2.0     86.5          4.0        692
6           7           3        0     1684.5       2.0      8.5          1.0          0
7           7           3     3600     1694.5       2.0     15.0          2.0        176
8           7           3    10800     1704.5       2.0     34.0          4.0        400

相关问题