一个简短的 Dataframe ,我想:
1.按分钟排列
1.添加缺失分钟的行,从09:05到09:20
1.然后每5分钟重新取样一次
time cars flow
0 9:07 737 In
1 9:06 22 Out
2 9:18 42 In
3 9:19 36 Unknown
我尝试的是:
data = {'time': ["9:07", "9:06", "9:18", "9:19"],
'cars' : [737, 22, 42,36],
'flow': ["In","Out","In","Unknown"]}
df = pd.DataFrame(data)
idx = pd.date_range("9:05", "09:20", freq="1min")
idx = idx.rename('time')
df = df.set_index('time')
df.index = pd.to_datetime(df.index)
df = df.reindex(idx, fill_value=0)
df = df.groupby('flow').resample('5T')['cars'].sum() # how_many_volume
print(df)
它返回:
flow time
0 2020-10-21 09:05:00 0
2020-10-21 09:10:00 0
2020-10-21 09:15:00 0
2020-10-21 09:20:00 0
In 2020-10-21 09:05:00 737
2020-10-21 09:10:00 0
2020-10-21 09:15:00 42
Out 2020-10-21 09:05:00 22
Unknown 2020-10-21 09:15:00 36
但我们想要的是
In 2020-10-21 09:05:00 737
2020-10-21 09:10:00 0
2020-10-21 09:15:00 42
2020-10-21 09:20:00 0
Out 2020-10-21 09:05:00 22
2020-10-21 09:10:00 0
2020-10-21 09:15:00 0
2020-10-21 09:20:00 0
Unknown 2020-10-21 09:05:00 0
2020-10-21 09:10:00 0
2020-10-21 09:15:00 36
2020-10-21 09:20:00 0
实现它的方法是什么?
1条答案
按热度按时间qzwqbdag1#
我个人会把数据留在表格中: