Pandas怎么只改日期不改时间?

fquxozlt  于 2023-03-11  发布在  其他
关注(0)|答案(2)|浏览(113)

我想把晚上21点的日期改到最后一个日期15点。
但15:00的最后日期不一定是1天前,有时是3天,有时是10天
表格:

date
0   2018-08-21 14:56:00
1   2018-08-21 14:57:00
2   2018-08-21 14:58:00
3   2018-08-21 14:59:00
4   2018-08-21 15:00:00
5   2018-08-22 21:01:00
6   2018-08-22 21:02:00
7   2018-08-22 21:03:00
8   2018-08-22 21:04:00
9   2018-08-22 21:05:00

晚上21点应该还是8月21日,不是8月22日
输入:

import pandas as pd
y=[
 '2018-08-18 21:00:00',
 '2018-08-18 21:03:00',
 '2018-08-19 9:00:00',
 '2018-08-19 15:00:00',
 '2018-08-22 21:01:00',
 '2018-08-22 21:02:00',
 '2018-08-22 22:59:00',
 '2018-08-22 23:00:00',
 '2018-08-22 09:01:00',
 '2018-08-22 09:02:00',
 '2018-08-22 15:00:00',
 '2018-08-25 21:01:00',
 '2018-08-25 21:05:00',
 '2018-08-25 09:01:00',
  ]
df=pd.DataFrame(y,columns=['date'])
df['date'] = pd.to_datetime(df['date'])

Python版本3.8.5Pandas版本1.1.3

h79rfbju

h79rfbju1#

用途:

#if necessary convert to datetimes and sorting
#df['date'] = pd.to_datetime(df['date'])
#df = df.sort_values('date')
#replace YYYY-MM-DD by last 15 hour dates to next rows
s = df['date'].dt.strftime('%Y-%m-%d').where(df['date'].dt.hour.eq(15)).ffill()
#replace 21 hours by YYYY-MM-DD
m = df['date'].dt.hour.eq(21) & s.notna()
df.loc[m, 'date'] =  pd.to_datetime(s[m] + ' '  + df.loc[m, 'date'].dt.strftime('%H:%M:%S'))

另一个想法:
一个二个一个一个
编辑:

y=[
 '2018-08-18 21:00:00',
 '2018-08-18 21:03:00',
 '2018-08-19 9:00:00',
 '2018-08-19 15:00:00',
 '2018-08-22 21:01:00',
 '2018-08-22 21:02:00',
 '2018-08-22 22:59:00',
 '2018-08-22 23:00:00',
 '2018-08-22 09:01:00',
 '2018-08-22 09:02:00',
 '2018-08-22 15:00:00',
 '2018-08-25 21:01:00',
 '2018-08-25 21:05:00',
 '2018-08-25 09:01:00',
  ]
df=pd.DataFrame(y,columns=['date'])
df['date'] = pd.to_datetime(df['date'])
#replace YYYY-MM-DD by last 15 hour dates to next rows
s = df['date'].dt.strftime('%Y-%m-%d').where(df['date'].dt.hour.eq(15)).ffill()
#replace 21 hours by YYYY-MM-DD
m = df['date'].dt.hour.eq(21) & s.notna()
df.loc[m, 'date'] =  pd.to_datetime(s[m] + ' '  + df.loc[m, 'date'].dt.strftime('%H:%M:%S'))

print (df)
                  date
0  2018-08-18 21:00:00
1  2018-08-18 21:03:00
2  2018-08-19 09:00:00
3  2018-08-19 15:00:00
4  2018-08-19 21:01:00
5  2018-08-19 21:02:00
6  2018-08-22 22:59:00
7  2018-08-22 23:00:00
8  2018-08-22 09:01:00
9  2018-08-22 09:02:00
10 2018-08-22 15:00:00
11 2018-08-22 21:01:00
12 2018-08-22 21:05:00
13 2018-08-25 09:01:00

测试另一种解决方案:

#replace datetimes without times by last 15 hour dates to next rows
dates = df['date'].dt.normalize()
s = dates.sub(dates.where(df['date'].dt.hour.eq(15)).ffill())
#replace 21 hours by previous difference of dates
m = df['date'].dt.hour.eq(21) & s.notna()
#subtract timedeltas
df.loc[m, 'date'] -= s[m]

print (df)
                  date
0  2018-08-18 21:00:00
1  2018-08-18 21:03:00
2  2018-08-19 09:00:00
3  2018-08-19 15:00:00
4  2018-08-19 21:01:00
5  2018-08-19 21:02:00
6  2018-08-22 22:59:00
7  2018-08-22 23:00:00
8  2018-08-22 09:01:00
9  2018-08-22 09:02:00
10 2018-08-22 15:00:00
11 2018-08-22 21:01:00
12 2018-08-22 21:05:00
13 2018-08-25 09:01:00
n9vozmp4

n9vozmp42#

可以使用累积Timedelta

# ensure we have a datetime type
df['date'] = pd.to_datetime(df['date'])

# is the previous time 15:00:00?
m1 = df['date'].dt.hour.eq(15).shift()
# is the current time 21:00:00?
m2 = df['date'].dt.hour.eq(21)

# remove the extra days for every 15h->21h shift
df['date'] -= df['date'].diff().where(m1&m2, '0').dt.floor('D').cumsum()

输出:

date
0 2018-08-21 14:56:00
1 2018-08-21 14:57:00
2 2018-08-21 14:58:00
3 2018-08-21 14:59:00
4 2018-08-21 15:00:00
5 2018-08-21 21:01:00
6 2018-08-21 21:02:00
7 2018-08-21 21:03:00
8 2018-08-21 21:04:00
9 2018-08-21 21:05:00

中间体(更复杂的示例):

date     m1     m2      diff_where  floor cumsum
 0 2018-08-21 14:56:00    NaN  False 0 days 00:00:00 0 days 0 days
 1 2018-08-21 14:57:00  False  False 0 days 00:00:00 0 days 0 days
 2 2018-08-21 14:58:00  False  False 0 days 00:00:00 0 days 0 days
 3 2018-08-21 14:59:00  False  False 0 days 00:00:00 0 days 0 days
 4 2018-08-21 15:00:00  False  False 0 days 00:00:00 0 days 0 days
 5 2018-08-22 21:01:00   True   True 1 days 06:01:00 1 days 1 days
 6 2018-08-22 21:02:00  False   True 0 days 00:00:00 0 days 1 days
 7 2018-08-22 21:03:00  False   True 0 days 00:00:00 0 days 1 days
 8 2018-08-22 21:04:00  False   True 0 days 00:00:00 0 days 1 days
 9 2018-08-22 21:05:00  False   True 0 days 00:00:00 0 days 1 days
10 2018-08-23 14:59:00  False  False 0 days 00:00:00 0 days 1 days
11 2018-08-23 15:00:00  False  False 0 days 00:00:00 0 days 1 days
12 2018-08-25 21:01:00   True   True 2 days 06:01:00 2 days 3 days
13 2018-08-25 21:02:00  False   True 0 days 00:00:00 0 days 3 days

可重现输入:

df = pd.DataFrame({'date': ['2018-08-21 14:56:00', '2018-08-21 14:57:00', '2018-08-21 14:58:00',
                            '2018-08-21 14:59:00', '2018-08-21 15:00:00', '2018-08-22 21:01:00',
                            '2018-08-22 21:02:00', '2018-08-22 21:03:00', '2018-08-22 21:04:00',
                            '2018-08-22 21:05:00']})

相关问题