在具有重复DateTimeIndex的Pandas DataFrame上使用ffill添加缺失数据

p1iqtdky  于 2023-04-04  发布在  其他
关注(0)|答案(1)|浏览(176)

我有一个CSV输入文件,看起来像这样

date,server_cluster,server,cpu,watt
2023-03-29 12:00:00,cluster1,server1,cpu1,104
2023-03-29 12:00:00,cluster1,server1,cpu2,105
2023-03-29 12:00:00,cluster1,server2,cpu1,122
2023-03-29 12:00:00,cluster1,server2,cpu2,103
2023-03-29 12:00:00,cluster2,server1,cpu1,105
2023-03-29 12:00:00,cluster2,server1,cpu2,154
2023-03-29 12:00:00,cluster2,server2,cpu1,122
2023-03-29 12:00:00,cluster2,server2,cpu2,112
2023-03-29 12:15:00,cluster1,server1,cpu1,134
2023-03-29 12:15:00,cluster1,server1,cpu2,145
2023-03-29 12:15:00,cluster1,server2,cpu1,121
2023-03-29 12:15:00,cluster1,server2,cpu2,107
2023-03-29 12:15:00,cluster2,server1,cpu1,167
2023-03-29 12:15:00,cluster2,server1,cpu2,103
2023-03-29 12:15:00,cluster2,server2,cpu1,122
2023-03-29 12:15:00,cluster2,server2,cpu2,177

因此,每15分钟收集一次数据,但问题是,由于重新启动、停机等原因,群集可能会停机一段时间,这意味着在这15分钟的窗口内会丢失数据。
我的问题是,我如何将丢失的时间窗口添加到我的 Dataframe 中?我想到的最好的方法是这个怪物:

df.groupby(["server_cluster", "server", "cpu"]).resample("15min").last().ffill().reset_index("date").reset_index(drop=True).set_index("date")

有更干净的解决方案吗?

4xrmg8kj

4xrmg8kj1#

IIUC用途:

df1 = df.groupby(["server_cluster", "server", "cpu"])
         .resample("15min").last().ffill().droplevel([0,1,2])

相关问题