从pandas数据框的列中获取大于某个值的最小值

yx2lnoni  于 2023-04-28  发布在  其他
关注(0)|答案(1)|浏览(156)

我有一个pandas dataframe,想创建一个列min_date_after_ref_date,它显示给定ref_date之后的最小日期。我有以下代码。

ref_date = datetime.strptime('2023-04-21 12:00', '%Y-%m-%d %H:%M')
df = pd.DataFrame({'id':[1,2,1,1,3], 'time_stamp': ['2023-04-19 12:05', '2023-04-21 12:45',
                                                 '2023-04-21 15:45', '2023-04-23 13:15', '2023-04-18 12:05']})
df = df.assign(time_stamp=pd.to_datetime(df.time_stamp))
df = df.assign(min_date_after_ref_date=df[df.time_stamp>ref_date].groupby('id').time_stamp.transform('min'))

我得到了这个

id  time_stamp  min_date_after_ref_date
0   1   2023-04-19 12:05:00 NaT
1   2   2023-04-21 12:45:00 2023-04-21 12:45:00
2   1   2023-04-21 15:45:00 2023-04-21 15:45:00
3   1   2023-04-23 13:15:00 2023-04-21 15:45:00
4   3   2023-04-18 12:05:00 NaT

但是我希望第一行也有2023-04-21 15:45:00(而不是NaN),这样对于每个id,总是有相同的min_date_after_ref_date值。我如何改变这一点?

k97glaaz

k97glaaz1#

如果条件为False,则使用Series.where来设置NaT

out = df.assign(min_date_after_ref_date=df.time_stamp.where(df.time_stamp>ref_date)
                                           .groupby(df['id'])
                                           .transform('min'))
print (out)
   id          time_stamp min_date_after_ref_date
0   1 2023-04-19 12:05:00     2023-04-21 15:45:00
1   2 2023-04-21 12:45:00     2023-04-21 12:45:00
2   1 2023-04-21 15:45:00     2023-04-21 15:45:00
3   1 2023-04-23 13:15:00     2023-04-21 15:45:00
4   3 2023-04-18 12:05:00                     NaT

或者使用Series.map而不使用transform

out = df.assign(min_date_after_ref_date=df['id'].map(df[df.time_stamp>ref_date]
                                                     .groupby('id').time_stamp.min()))
print (df)
   id          time_stamp min_date_after_ref_date
0   1 2023-04-19 12:05:00     2023-04-21 15:45:00
1   2 2023-04-21 12:45:00     2023-04-21 12:45:00
2   1 2023-04-21 15:45:00     2023-04-21 15:45:00
3   1 2023-04-23 13:15:00     2023-04-21 15:45:00
4   3 2023-04-18 12:05:00                     NaT

相关问题