pandas 创建基于pandi.DataFrame.between_time()的列,不带时间作为索引列

dsf9zpds  于 2022-11-27  发布在  其他
关注(0)|答案(2)|浏览(100)

我得到了一个日期/时间以秒为单位的 Dataframe ,我更改了它:

df["start"]   = pd.to_datetime(df["start"], unit='s')
df["time"]    = df["start"].dt.time

现在我想添加一列df[“timeofday”],其中包含时间字符串。
晚上0:00 - 5:59
上午6:00 - 11:59
下午12:00 - 17:59
晚上18:00 - 21:59
晚上22:00 - 23:59
我假设我需要使用一个for循环和between_time()。但是,这不起作用,因为我似乎需要使用time列作为 Dataframe 的索引列。但是, Dataframe 有一个我不想丢失的索引。即使我可以添加第二个索引,然后对每个时间段进行过滤,我不清楚如何将相应的字符串插入到新的timeofday列中。
我试着过滤

df.time.between_time('02:00', '03:30')

这就导致了
TypeError:索引必须是日期时间索引
因此我假设需要将time列设置为新索引

df.set_index("time", inplace=True)
df["timeofday"] = 'night'
df["timeofday"][df.time.between_time('06:00', '11:59')] = "morning"

这导致相同的
TypeError:索引必须是日期时间索引
在那之后我试着

df.set_index("start", inplace=True)
df["timeofday"] = 'night'
df["timeofday"][df.between_time('06:00', '11:59')] = "morning"

导致
设置复制警告:尝试在DataFrame InvalidIndexError的切片副本上设置值

wz8daaqr

wz8daaqr1#

找到解决方案

df.set_index("start", inplace=True)
df["timeofday"] = 'night'
mask = df.between_time('06:00', '11:59')
df.loc[mask.index, 'timeofday'] = "morning"
3htmauhk

3htmauhk2#

我们可以使用pandas.DataFrame.locpandas.Series.between来完成这个任务。

溶液

import pandas as pd
from io import StringIO

# Example data with expected result so we can check our work later
input_data = """
start,expected_timeofday
2022-11-26 01:41:26,night
2022-11-26 03:13:06,night
2022-11-26 04:40:58,night
2022-11-26 06:07:06,morning
2022-11-26 06:27:14,morning
2022-11-26 06:28:16,morning
2022-11-26 07:34:46,morning
2022-11-26 10:01:44,morning
2022-11-26 13:45:08,afternoon
2022-11-26 15:40:36,afternoon
2022-11-26 15:59:00,afternoon
2022-11-26 16:51:03,afternoon
2022-11-26 17:15:42,afternoon
2022-11-26 18:24:02,evening
2022-11-26 18:34:37,evening
2022-11-26 19:21:00,evening
2022-11-26 19:41:17,evening
2022-11-26 21:53:10,evening
2022-11-26 23:16:29,night
2022-11-26 23:36:08,night
""".strip()

# Read example data from CSV-formatted string
df = pd.read_csv(StringIO(input_data), parse_dates=['start'])

class TimeOfDay():
    MORNING = 'morning'
    AFTERNOON = 'afternoon'
    EVENING = 'evening'
    NIGHT = 'night'

# Set `timeofday` category by using a filter on the
# hour property of the datetime column `start`.
df['timeofday'] = None
df.loc[df.start.dt.hour.between(0, 6, inclusive='left'), 'timeofday'] = TimeOfDay.NIGHT
df.loc[df.start.dt.hour.between(6, 12, inclusive='left'), 'timeofday'] = TimeOfDay.MORNING
df.loc[df.start.dt.hour.between(12, 18, inclusive='left'), 'timeofday'] = TimeOfDay.AFTERNOON
df.loc[df.start.dt.hour.between(18, 22, inclusive='left'), 'timeofday'] = TimeOfDay.EVENING
df.loc[df.start.dt.hour.between(22, 24, inclusive='left'), 'timeofday'] = TimeOfDay.NIGHT

# Check our work; raises an exception if we made a mistake
assert((df.timeofday == df.expected_timeofday).all())

# Result
print(df)

结果

start expected_timeofday  timeofday
0  2022-11-26 01:41:26              night      night
1  2022-11-26 03:13:06              night      night
2  2022-11-26 04:40:58              night      night
3  2022-11-26 06:07:06            morning    morning
4  2022-11-26 06:27:14            morning    morning
5  2022-11-26 06:28:16            morning    morning
6  2022-11-26 07:34:46            morning    morning
7  2022-11-26 10:01:44            morning    morning
8  2022-11-26 13:45:08          afternoon  afternoon
9  2022-11-26 15:40:36          afternoon  afternoon
10 2022-11-26 15:59:00          afternoon  afternoon
11 2022-11-26 16:51:03          afternoon  afternoon
12 2022-11-26 17:15:42          afternoon  afternoon
13 2022-11-26 18:24:02            evening    evening
14 2022-11-26 18:34:37            evening    evening
15 2022-11-26 19:21:00            evening    evening
16 2022-11-26 19:41:17            evening    evening
17 2022-11-26 21:53:10            evening    evening
18 2022-11-26 23:16:29              night      night
19 2022-11-26 23:36:08              night      night

相关问题