numpy 如何在分组运算中利用if条件实现滚动平均

o2gm4chl  于 11个月前  发布在  其他
关注(0)|答案(2)|浏览(84)

我试图添加一个代码,其中,如果当前日期和id列的最小日期之间的差异超过3个月,滚动平均值应该是21天的窗口,否则应该是7天的窗口。
示例数据:

import pandas as pd
import numpy as np
np.random.seed(0)
dt = pd.DataFrame({'id' : [1,1,2,2,1],
                  'date' : ['2023-09-01', '2023-09-10', '2023-01-01', '2023-01-13', '2023-09-11'],
                  'rev' : np.random.randint(100, 150, 5)})
dt
    id  date        rev
0   1   2023-09-01  144
1   1   2023-09-10  147
2   2   2023-01-01  100
3   2   2023-01-13  103
4   1   2023-09-11  103

字符串
我想得到的滚动平均值是:

dt.groupby('id').transform(lambda x : x['rev'].rolling(window = '21D', min_periods = 1).mean() if pd.to_datetime('today') - x['date'].min() >= 90 else x['rev'].rolling(window = '7D', min_periods = 1).mean())


但是我得到了KeyError: 'date'错误。
有人能帮我弄一下滚动装置吗?

wtzytmuj

wtzytmuj1#

创建DatetimeIndex并按Timedelta.days比较天数:

dt['date'] = pd.to_datetime(dt['date'])

out = (dt.set_index('date')
         .groupby('id')['rev']
         .apply(lambda x : x.rolling(window = '21D', min_periods = 1).mean() 
                          if (pd.to_datetime('today') - x.index.min()).days >= 90 
                          else x.rolling(window = '7D', min_periods = 1).mean()))

print (out)
id  date      
1   2023-09-01    144.0
    2023-09-10    147.0
    2023-09-11    125.0
2   2023-01-01    100.0
    2023-01-13    101.5
Name: rev, dtype: float64

字符串
transform

dt['date'] = pd.to_datetime(dt['date'])

out = (dt.set_index('date')
         .groupby('id')['rev']
         .transform(lambda x : x.rolling(window = '21D', min_periods = 1).mean() 
                           if (pd.to_datetime('today') - x.index.min()).days >= 90 
                           else x.rolling(window = '7D', min_periods = 1).mean()))

print (out)
date
2023-09-01    144.0
2023-09-10    147.0
2023-01-01    100.0
2023-01-13    101.5
2023-09-11    125.0
Name: rev, dtype: float64

57hvy0tb

57hvy0tb2#

您可以在groupby.apply中使用on关键字:

dt['rev'] = (dt.groupby('id', group_keys=False)
               .apply(lambda x : x.rolling(window='21D', on='date', min_periods = 1)['rev'].mean()
                      if pd.to_datetime('today') - x['date'].min() >= pd.Timedelta('90D')
                      else x.rolling(window = '7D', on='date', min_periods = 1)['rev'].mean()
                     )
            )

字符串
输出量:

id       date    rev
0   1 2023-09-01  144.0
1   1 2023-09-10  147.0
2   2 2023-01-01  100.0
3   2 2023-01-13  101.5
4   1 2023-09-11  125.0

相关问题