Pandas使用外部连接和分组依据来填充缺少的日期

ttisahbt  于 2023-04-28  发布在  其他
关注(0)|答案(1)|浏览(85)

这是我的数据框

import pandas as pd

# create sample data
data = {'store name': ['Store A', 'Store A', 'Store B', 'Store B', 'Store C', 'Store C'],
        'time': ['2023-04-25 00:00:00', '2023-04-25 01:00:00', '2023-04-25 00:00:00', '2023-04-25 01:00:00', '2023-04-25 00:00:00', '2023-04-25 01:00:00'],
        'sales': [100, 200, 150, 250, 300, 350]}

# create pandas dataframe
df = pd.DataFrame(data)

我想填写04-25所有商店的缺失时间

import datetime
start = datetime.datetime(2023, 4, 25)
end = datetime.datetime(2023, 4, 26)
full_range = pd.date_range(start, end, freq = 'H')
time = pd.DataFrame(full_range, columns = ['time'])
df = df.groupby('store name').apply(lambda x : pd.merge(x, time, on = 'time', how = 'outer')).reset_index()

但是这将返回'store name'已经作为列存在。如果我执行reset_index(drop = True),那么它将删除所有添加的缺失时间。我应该怎么做?

5t7ly7z5

5t7ly7z51#

IIUC,你可以使用pivotpivot_table来重塑你的 Dataframe ,然后重新索引time

start = pd.Timestamp(2023, 4, 25)
end = pd.Timestamp(2023, 4, 26)
full_range = pd.date_range(start, end, freq='H', name='time')

out = (df.pivot(index='time', columns='store name', values='sales')
         .reindex(full_range, fill_value=0).unstack().rename('sales').reset_index())

输出:

>>> out
   store name                time  sales
0     Store A 2023-04-25 00:00:00    100
1     Store A 2023-04-25 01:00:00    200
2     Store A 2023-04-25 02:00:00      0
3     Store A 2023-04-25 03:00:00      0
4     Store A 2023-04-25 04:00:00      0
..        ...                 ...    ...
70    Store C 2023-04-25 20:00:00      0
71    Store C 2023-04-25 21:00:00      0
72    Store C 2023-04-25 22:00:00      0
73    Store C 2023-04-25 23:00:00      0
74    Store C 2023-04-26 00:00:00      0

更新

如果我的值包含多个列,比如['sales','traffic'],该怎么办
对于多个值列,可以用途:

out = (df.pivot(index='time', columns='store name', values=['sales', 'traffic'])
         .reindex(full_range, fill_value=0).stack(level=1)
         .swaplevel().sort_index(level='store name').reset_index())

相关问题