pandas 如何按时间间隔分组

8qgya5xd  于 2023-10-14  发布在  其他
关注(0)|答案(2)|浏览(93)

我有这个数据框

`print(DFrame)

dispatch_time   count
0   2018-08-13 00:02:27 26
1   2018-08-13 00:03:47 24
2   2018-08-13 00:19:36 25
3   2018-08-13 00:21:12 25
4   2018-08-13 00:22:47 25
... ... ...
636 2018-08-14 23:16:44 33
637 2018-08-14 23:30:33 25
638 2018-08-14 23:34:22 33
639 2018-08-14 23:41:14 79
640 2018-08-14 23:47:29 35`

然后我用

`DFrame.dtypes

dispatch_time    object
count             int64
dtype: object

`
因为我在使用下面的代码时有时会遇到问题

Splot = []
for i in DFrame['dispatch_time']:
    d = i.split(".")[0]
    Splot.append(d)

DFrame['dispatch_time'] = Splot

有了这段代码,我没有毫秒的帧,我的问题是,下面我怎么能使间隔2小时和天?因为我试着用
DFrame['dispatch_time'] = pd.to_datetime(DFrame['dispatch_time'])
然后
DFrame = DFrame.resample('2H').sum()
这给予我

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[150], line 1
----> 1 DFrame = DFrame.resample('2H').sum()

File c:\users\roexz\appdata\local\programs\python\python39\lib\site-packages\pandas\core\frame.py:10999, in DataFrame.resample(self, rule, axis, closed, label, convention, kind, on, level, origin, offset, group_keys)
  10984 @doc(NDFrame.resample, **_shared_doc_kwargs)
  10985 def resample(
  10986     self,
   (...)
  10997     group_keys: bool = False,
  10998 ) -> Resampler:
> 10999     return super().resample(
  11000         rule=rule,
  11001         axis=axis,
  11002         closed=closed,
  11003         label=label,
  11004         convention=convention,
  11005         kind=kind,
  11006         on=on,
  11007         level=level,
  11008         origin=origin,
  11009         offset=offset,
  11010         group_keys=group_keys,
  11011     )

File c:\users\roexz\appdata\local\programs\python\python39\lib\site-packages\pandas\core\generic.py:8888, in NDFrame.resample(self, rule, axis, closed, label, convention, kind, on, level, origin, offset, group_keys)
   8885 from pandas.core.resample import get_resampler
   8887 axis = self._get_axis_number(axis)
-> 8888 return get_resampler(
   8889     cast("Series | DataFrame", self),
   8890     freq=rule,
   8891     label=label,
   8892     closed=closed,
   8893     axis=axis,
   8894     kind=kind,
   8895     convention=convention,
   8896     key=on,
   8897     level=level,
   8898     origin=origin,
   8899     offset=offset,
   8900     group_keys=group_keys,
   8901 )

File c:\users\roexz\appdata\local\programs\python\python39\lib\site-packages\pandas\core\resample.py:1523, in get_resampler(obj, kind, **kwds)
   1519 """
   1520 Create a TimeGrouper and return our resampler.
   1521 """
   1522 tg = TimeGrouper(**kwds)
-> 1523 return tg._get_resampler(obj, kind=kind)

File c:\users\roexz\appdata\local\programs\python\python39\lib\site-packages\pandas\core\resample.py:1713, in TimeGrouper._get_resampler(self, obj, kind)
   1704 elif isinstance(ax, TimedeltaIndex):
   1705     return TimedeltaIndexResampler(
   1706         obj,
   1707         timegrouper=self,
   (...)
   1710         gpr_index=ax,
   1711     )
-> 1713 raise TypeError(
   1714     "Only valid with DatetimeIndex, "
   1715     "TimedeltaIndex or PeriodIndex, "
   1716     f"but got an instance of '{type(ax).__name__}'"
   1717 )

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'
50few1ms

50few1ms1#

可以使用pd.Grouper和参数freq=

df["dispatch_time"] = pd.to_datetime(df["dispatch_time"])

df = df.groupby(pd.Grouper(key="dispatch_time", freq="2H")).sum()

print(df)

图纸:

count
dispatch_time             
2018-08-13 00:00:00    125
2018-08-13 02:00:00      0
2018-08-13 04:00:00      0
2018-08-13 06:00:00      0

...
yc0p9oo0

yc0p9oo02#

由于您的索引不是dispatch_time列,因此您必须指定滚动窗口应应用的列:

# Convert as datetime64
DFrame['dispatch_time'] = pd.to_datetime(DFrame['dispatch_time'])

out = DFrame.resample('2H', on='dispatch_time').sum()

输出量:

>>> out
                     count
dispatch_time             
2018-08-13 00:00:00    125
2018-08-13 02:00:00      0
2018-08-13 04:00:00      0
2018-08-13 06:00:00      0
2018-08-13 08:00:00      0
...
2018-08-14 14:00:00      0
2018-08-14 16:00:00      0
2018-08-14 18:00:00      0
2018-08-14 20:00:00      0
2018-08-14 22:00:00    205

resample的文档:

on:str,可选

对于DataFrame,用于重新排序的列而不是索引。
列必须类似于日期时间。

相关问题