pandas 在Python中创建包含月开始和结束的数据框

j13ufse2  于 2023-06-20  发布在  Python
关注(0)|答案(2)|浏览(146)

我想从给定的开始和结束日期创建一个pandas dataframe:

import pandas as pd
from pandas.tseries.offsets import MonthEnd
start_date = "2020-05-17"
end_date = "2020-07-23"

对于这个dataframe中的每一行,我应该有一个月的开始和结束,所以预期的输出是:

start       end         month   year
2020-05-17  2020-05-31  May     2020
2020-06-01  2020-06-30  June    2020
2020-07-01  2020-07-23  July    2020

我知道我必须在start_dateend_date创建的间隔之间循环每个月。虽然我知道如何提取日期中的最后一天:

def last_day(date: str):
    return pd.Timestamp(date) + MonthEnd(1)

我不知道该怎么在中场休息时做这个。任何建议将不胜感激。

ws51t4hk

ws51t4hk1#

可以使用pd.date_rangepd.to_datetime

start = pd.to_datetime([start_date] + pd.date_range(start_date, end_date, freq='MS').tolist())
end = pd.to_datetime(pd.date_range(start_date, end_date, freq='M').tolist() + [end_date])
month = start.strftime('%B')
year = start.year

df = pd.DataFrame({'start': start, 'end': end, 'month': month, 'year': year})

输出:

>>> df
       start        end month  year
0 2020-05-17 2020-05-31   May  2020
1 2020-06-01 2020-06-30  June  2020
2 2020-07-01 2020-07-23  July  2020
vc6uscn9

vc6uscn92#

您可以转换.isocalendar()的输出:

r = pd.date_range(start_date, end_date, freq="D").isocalendar()
out = (
    r.assign(month=r.index.month)
    .reset_index()
    .groupby(["year", "month"])["index"]
    .agg(("first", "last"))
    .reset_index()
)
print(out)

图纸:

year  month      first       last
0  2020      5 2020-05-17 2020-05-31
1  2020      6 2020-06-01 2020-06-30
2  2020      7 2020-07-01 2020-07-23

要具有字符串月份名称,请执行以下操作:

out = out.rename(columns={'first':'start', 'last':'end'})
out['month'] = pd.to_datetime(out['month'], format='%m').dt.strftime('%b')
print(out)

图纸:

year month      start        end
0  2020   May 2020-05-17 2020-05-31
1  2020   Jun 2020-06-01 2020-06-30
2  2020   Jul 2020-07-01 2020-07-23

相关问题