pandas Dataframe Groupby和Resample,每分钟添加缺失行

uinbv5nw  于 2023-09-29  发布在  其他
关注(0)|答案(1)|浏览(79)

一个简短的 Dataframe ,我想:
1.按分钟排列
1.添加缺失分钟的行,从09:05到09:20
1.然后每5分钟重新取样一次

time  cars     flow
 0  9:07   737       In
 1  9:06    22      Out
 2  9:18    42       In
 3  9:19    36  Unknown

我尝试的是:

data = {'time': ["9:07", "9:06", "9:18", "9:19"], 
'cars' : [737,  22, 42,36],
'flow': ["In","Out","In","Unknown"]}

df = pd.DataFrame(data)
idx = pd.date_range("9:05", "09:20", freq="1min")
idx = idx.rename('time')

df = df.set_index('time')
df.index = pd.to_datetime(df.index)
df = df.reindex(idx, fill_value=0)

df = df.groupby('flow').resample('5T')['cars'].sum()     # how_many_volume
print(df)

它返回:

flow     time               
0        2020-10-21 09:05:00      0
         2020-10-21 09:10:00      0
         2020-10-21 09:15:00      0
         2020-10-21 09:20:00      0
In       2020-10-21 09:05:00    737
         2020-10-21 09:10:00      0
         2020-10-21 09:15:00     42
Out      2020-10-21 09:05:00     22
Unknown  2020-10-21 09:15:00     36

但我们想要的是

In             2020-10-21 09:05:00    737
               2020-10-21 09:10:00    0
               2020-10-21 09:15:00    42
               2020-10-21 09:20:00    0
Out            2020-10-21 09:05:00    22
               2020-10-21 09:10:00    0
               2020-10-21 09:15:00    0
               2020-10-21 09:20:00    0
Unknown        2020-10-21 09:05:00    0
               2020-10-21 09:10:00    0
               2020-10-21 09:15:00    36
               2020-10-21 09:20:00    0

实现它的方法是什么?

qzwqbdag

qzwqbdag1#

df2 = df.pivot(columns='flow').resample('5T').sum().stack()
df2.index = df2.index.swaplevel(0, 1)
>>> df2.sort_index()
                              cars
flow    time                      
In      2020-10-20 09:05:00  737.0
        2020-10-20 09:10:00    0.0
        2020-10-20 09:15:00   42.0
Out     2020-10-20 09:05:00   22.0
        2020-10-20 09:10:00    0.0
        2020-10-20 09:15:00    0.0
Unknown 2020-10-20 09:05:00    0.0
        2020-10-20 09:10:00    0.0
        2020-10-20 09:15:00   36.0

我个人会把数据留在表格中:

>>> df.pivot(columns='flow').resample('5T').sum()
                      cars              
flow                    In   Out Unknown
time                                    
2020-10-20 09:05:00  737.0  22.0     0.0
2020-10-20 09:10:00    0.0   0.0     0.0
2020-10-20 09:15:00   42.0   0.0    36.0

相关问题