pandas 发生次数总和的日期范围

0md85ypi 于 2023-01-04 发布在其他

关注(0)|答案(3)|浏览(156)

我有一个可能很大的 Dataframe ，其中包含来自日期范围查询的日期时间，如下所示：

0   2022-11-20 00:02:22.630968+00:00
1   2022-11-23 00:03:02.134938+00:00
2   2022-11-23 00:03:50.589251+00:00
3   2022-11-26 00:05:17.568843+00:00
4   2022-11-26 00:05:22.653905+00:00
5   2022-11-26 00:05:22.653905+00:00
6   2022-11-26 00:05:22.653905+00:00

我需要重新塑造成一个日期列表中的日期出现次数在第二行，没有日期出现必须为零填充如下：

2022-11-20 1
2022-11-21 0
2022-11-22 0
2022-11-23 2
2022-11-24 0
2022-11-25 0
2022-11-26 4

什么是最有效的方法来实现这一点与Pandas？
如果这有用的话，最终目标是将这些数据提供给Calplot。
谢啦，谢啦

pandas

来源：https://stackoverflow.com/questions/74964742/date-range-with-occurrence-sum

3条答案

按热度按时间

kyks70gy1#

import pandas as pd

series = pd.Series([
    "2022-11-20T00:02:22.630968+00:00",
    "2022-11-23T00:03:02.134938+00:00",
    "2022-11-23T00:03:50.589251+00:00",
    "2022-11-26T00:05:17.568843+00:00",
    "2022-11-26T00:05:22.653905+00:00",
    "2022-11-26T00:05:22.653905+00:00",
    "2022-11-26T00:05:22.653905+00:00"
])

date_occurrences = pd.to_datetime(series).dt.date.value_counts()

# If your original series is sorted, you can just use the first and last value
start, end = date_occurrences.index.min(), date_occurrences.index.max()

all_dates = pd.date_range(start, end)

out = date_occurrences.reindex(all_dates, fill_value=0)

out是一个pd.Series，其中日期作为索引，计数作为值：

2022-11-20    1
2022-11-21    0
2022-11-22    0
2022-11-23    2
2022-11-24    0
2022-11-25    0
2022-11-26    4
Freq: D, dtype: int64

赞(0）回复(0）举报 2023-01-04

sqserrrh2#

您可以使用date_range在原始 Dataframe 中构建一个从最小值到最大值的日期范围，然后使用该范围创建一个 Dataframe ，然后将日期Map到从原始df构建的value_counts()系列。
假设您的原始 Dataframe 是df，包含日期的列是date，那么您可以执行以下操作：

idx = pd.date_range(df["date"].dt.date.min(), df["date"].dt.date.max())
out = pd.DataFrame(data=idx, columns=["date"])
out["num_occurrences"] = (
    out["date"].map(df["date"].dt.date.value_counts()).fillna(0).astype(int)
)
print(out)

        date  num_occurrences
0 2022-11-20                1
1 2022-11-21                0
2 2022-11-22                0
3 2022-11-23                2
4 2022-11-24                0
5 2022-11-25                0
6 2022-11-26                4

赞(0）回复(0）举报 2023-01-04

6ojccjat3#

假设df是您的 Dataframe ，Datetime是列名，下面是关于pandas.Series.reindex的命题：

from calplot import calplot

ser = pd.to_datetime(df["Datetime"]).dt.date

(
    df
      .assign(Datetime= ser)
      .squeeze()
      .value_counts()
      .reindex(pd.date_range(ser.min(),
                             ser.max()), fill_value=0)
      .pipe(lambda x: calplot(x, cmap='YlGn', colorbar=False))
)

输出：

中间体：*

2022-11-20    1
2022-11-21    0
2022-11-22    0
2022-11-23    2
2022-11-24    0
2022-11-25    0
2022-11-26    4
Freq: D, Name: Datetime, dtype: int64 <class 'pandas.core.series.Series'>

赞(0）回复(0）举报 2023-01-04

我来回答

pandas 发生次数总和的日期范围

3条答案

输出：

相关问题

热门标签

最新问答