Pandas:滚动平均值在新的多指数值上再次开始

elcex8rz  于 2022-12-16  发布在  其他
关注(0)|答案(2)|浏览(119)

我有以下数据框:

df = pd.DataFrame({'Team':['A','A','A','A','B','B','B','B'],
                   'Date':list(pd.date_range(start='1/1/2021', periods=8)),
                   'Score':[7,3,3,6,7,3,7,5],
                  }).set_index(['Team', 'Date'])

我想添加一个滚动平均值列,当0级指数是一个新值时,该列将重置。下面的简单代码不起作用,因为滚动平均值在指数值之间结转:

df['Avg'] = df['Score'].rolling(window=2).mean()

                 Score  Avg
Team Date                  
A    2021-01-01      7  NaN
     2021-01-02      3  5.0
     2021-01-03      3  3.0
     2021-01-04      6  4.5
B    2021-01-05      7  6.5
     2021-01-06      3  5.0
     2021-01-07      7  5.0
     2021-01-08      5  6.0

如何获取以下数据框?:

Score  Avg
Team Date                  
A    2021-01-01      7  NaN
     2021-01-02      3  5.0
     2021-01-03      3  3.0
     2021-01-04      6  4.5
B    2021-01-05      7  NaN
     2021-01-06      3  5.0
     2021-01-07      7  5.0
     2021-01-08      5  6.0

谢谢

taor4pac

taor4pac1#

使用df.group_by(分配给新列时使用df.values):

df['Avg'] = df.groupby('Team').rolling(window=2).mean().values

生产

Score  Avg
Team Date                  
A    2021-01-01      7  NaN
     2021-01-02      3  5.0
     2021-01-03      3  3.0
     2021-01-04      6  4.5
B    2021-01-05      7  NaN
     2021-01-06      3  5.0
     2021-01-07      7  5.0
     2021-01-08      5  6.0
xa9qqrwz

xa9qqrwz2#

level='Team'droplevel上使用groupby rolling mean以正确对齐索引:

df['Avg'] = (
    df.groupby(level='Team')['Score'].rolling(window=2).mean().droplevel(0)
)

df

Score  Avg
Team Date                  
A    2021-01-01      7  NaN
     2021-01-02      3  5.0
     2021-01-03      3  3.0
     2021-01-04      6  4.5
B    2021-01-05      7  NaN
     2021-01-06      3  5.0
     2021-01-07      7  5.0
     2021-01-08      5  6.0

values相比,droplevel的优点是步进将正确对齐。
给定一个无序DataFrame,如下所示:

df = pd.DataFrame({'Team': ['B', 'B', 'B', 'B', 'A', 'A', 'A', 'A'],
                   'Date': list(pd.date_range(start='1/1/2021', periods=8)),
                   'Score': [7, 7, 7, 8, 1, 2, 1, 2],
                   }).set_index(['Team', 'Date'])

df

Score
Team Date             
B    2021-01-01      7
     2021-01-02      7
     2021-01-03      7
     2021-01-04      8
A    2021-01-05      1
     2021-01-06      2
     2021-01-07      1
     2021-01-08      2

请注意droplevelvalues之间的区别:

df['drop_level'] = (
    df.groupby(level='Team')['Score'].rolling(window=2).mean().droplevel(0)
)
df['values'] = (
    df.groupby(level='Team')['Score'].rolling(window=2).mean().values
)
Score  drop_level  values
Team Date                                 
B    2021-01-01      7         NaN     NaN
     2021-01-02      7         7.0     1.5
     2021-01-03      7         7.0     1.5
     2021-01-04      8         7.5     1.5  # These are the averages from A
A    2021-01-05      1         NaN     NaN
     2021-01-06      2         1.5     7.0  # These are the averages from B
     2021-01-07      1         1.5     7.0
     2021-01-08      2         1.5     7.5

相关问题