我有一个CSV输入文件,看起来像这样
date,server_cluster,server,cpu,watt
2023-03-29 12:00:00,cluster1,server1,cpu1,104
2023-03-29 12:00:00,cluster1,server1,cpu2,105
2023-03-29 12:00:00,cluster1,server2,cpu1,122
2023-03-29 12:00:00,cluster1,server2,cpu2,103
2023-03-29 12:00:00,cluster2,server1,cpu1,105
2023-03-29 12:00:00,cluster2,server1,cpu2,154
2023-03-29 12:00:00,cluster2,server2,cpu1,122
2023-03-29 12:00:00,cluster2,server2,cpu2,112
2023-03-29 12:15:00,cluster1,server1,cpu1,134
2023-03-29 12:15:00,cluster1,server1,cpu2,145
2023-03-29 12:15:00,cluster1,server2,cpu1,121
2023-03-29 12:15:00,cluster1,server2,cpu2,107
2023-03-29 12:15:00,cluster2,server1,cpu1,167
2023-03-29 12:15:00,cluster2,server1,cpu2,103
2023-03-29 12:15:00,cluster2,server2,cpu1,122
2023-03-29 12:15:00,cluster2,server2,cpu2,177
因此,每15分钟收集一次数据,但问题是,由于重新启动、停机等原因,群集可能会停机一段时间,这意味着在这15分钟的窗口内会丢失数据。
我的问题是,我如何将丢失的时间窗口添加到我的 Dataframe 中?我想到的最好的方法是这个怪物:
df.groupby(["server_cluster", "server", "cpu"]).resample("15min").last().ffill().reset_index("date").reset_index(drop=True).set_index("date")
有更干净的解决方案吗?
1条答案
按热度按时间4xrmg8kj1#
IIUC用途: