pandas 面板数据上的滚动窗口groupby

bvn4nwqk 于 2023-01-15 发布在其他

关注(0)|答案(1)|浏览(125)

目标是对面板数据执行分组滚动窗口计算。如果可能，避免使用apply和类似函数，因为在有许多观测组时，这些函数的执行速度很慢。请考虑以下具有月销售额的客户纵向数据框架：

customers = pd.Series(['a', 'b', 'c', 'd',]).rename('customer')
date_range = pd.date_range('01/2018', '01/2019', freq='M').to_period('M').rename('month')
example_df = pd.DataFrame(index=pd.MultiIndex.from_product([customers, date_range]))
example_df['sales'] = (np.random.random(example_df.shape[0]) > 0.9) * (np.random.randint(1, 25, example_df.shape[0])*100)

1.为什么下面的代码抛出一个错误，即使月份是一个索引的名称？

example_df.groupby('customer').rolling(3, on='month').sales.sum()

ValueError：指定为月份的无效，必须是列（属于DataFrame）、索引或None
一个解决办法是使用.reset_index将月份转换为列。据我所知，这是最简单的解决方案，但我仍然不清楚为什么重置索引是必要的。

example_df.reset_index('month').groupby('customer').rolling(3, on='month').sales.sum()

1.我发现下面的代码正确地执行了操作，但是在多索引中创建了一个新的级别。为什么它要这样做呢？

example_df.groupby('customer').rolling(3).sales.sum()

解决方法是只分配.values，但忽略索引可能并不总是可行的。

example_df['rolling_sum'] = example_df.groupby('customer').rolling(3).sales.sum().values

pandas

来源：https://stackoverflow.com/questions/75088079/rolling-window-groupby-on-panel-data

1条答案

按热度按时间

yquaqz181#

对于第二个问题，您不必忽略整个索引，只需droplevel(0)：

example_df.groupby('customer').rolling(3).sales.sum().droplevel(0)

输出：

customer  month  
a         2018-01       NaN
          2018-02       NaN
          2018-03       0.0
          2018-04       0.0
          2018-05       0.0
          2018-06       0.0
          2018-07       0.0
...

赞(0）回复(0）举报 2023-01-15

我来回答

pandas 面板数据上的滚动窗口groupby

1条答案

相关问题

热门标签

最新问答