python-3.x Pandas-我如何找到每个患者治疗中的滞后差异天数?

c86crjj0  于 2022-12-24  发布在  Python
关注(0)|答案(2)|浏览(120)

请帮忙。在我的样本中,每个病人都有多次治疗,每次治疗都从特定的一天开始。我的目标是计算每次治疗的差异。
还有,我所有的病人都在一列里。一旦有新病人来了,滞后差就得重新设置。
我当前的数据集格式:

df2 = pd.DataFrame({'patient': ['one', 'one', 'one', 'two','two', 'two'],    
...:                     'treatment_schedule': ['treatment1', 'treatment2', 'treatment3', 'treatment1', 'treatment2', 'treatment3'],        
...:                     'date': ['11/20/2022', '11/22/2022', '11/23/2022', '11/8/2022', '11/9/2022', '11/14/2022']})
 
 df2

我想要的数据集格式:

df3 = pd.DataFrame({'patient': ['one', 'one', 'one', 'two','two', 'two'],    
...:                     'treatment_schedule': ['treatment1', 'treatment2', 'treatment3', 'treatment1', 'treatment2', 'treatment3'],    
...:                     'date': ['11/20/2022', '11/22/2022', '11/23/2022', '11/8/2022', '11/9/2022', '11/14/2022'],   
...:                     'lag_diff_days_between_each_treatment':[0, 2, 1, 0, 1, 5]})     ##### If no prior values for patient one, then like to see either null or zero
 
 df3
fnvucqvd

fnvucqvd1#

使用DataFrameGroupBy.diff,通过Series.dt.days将时间增量转换为天数,并将0中的缺失值替换为Series.fillna

df2['date'] = pd.to_datetime(df2['date'])

df2['lag_diff_days_between_each_treatment'] = (df2.groupby('patient')['date']
                                                  .diff()
                                                  .dt.days
                                                  .fillna(0, downcast='int'))
print (df2)
  patient treatment_schedule       date  lag_diff_days_between_each_treatment
0     one         treatment1 2022-11-20                                     0
1     one         treatment2 2022-11-22                                     2
2     one         treatment3 2022-11-23                                     1
3     two         treatment1 2022-11-08                                     0
4     two         treatment2 2022-11-09                                     1
5     two         treatment3 2022-11-14                                     5
wnvonmuf

wnvonmuf2#

s = (pd.to_datetime(df2['date']).groupby(df2['patient']).diff(1)
     .div(pd.Timedelta('1day')).fillna(0).astype('int'))
df3 = df2.assign(lag_diff_days_between_each_treatment=s)

第一个月

patient treatment_schedule  date     lag_diff_days_between_each_treatment
0   one     treatment1          11/20/2022  0
1   one     treatment2          11/22/2022  2
2   one     treatment3          11/23/2022  1
3   two     treatment1          11/8/2022   0
4   two     treatment2          11/9/2022   1
5   two     treatment3          11/14/2022  5

相关问题