pandas 如何使用类似距离约束创建日期时间列表?

qhhrdooz  于 2023-08-01  发布在  其他
关注(0)|答案(2)|浏览(119)

我的输入是:
1.两个日期时间和一个范围:

start_dt = datetime.datetime(2023, 7, 26, 6, 0, 0)
end_dt = datetime.datetime(2023, 7, 26, 15, 0, 0)
range_of_minutes = range(15, 201) # between 15 and 200 minutes

字符串
1.一个pandas DataFrame:

df = pd.DataFrame(
    {'item':
     ['XXX01', 'XXX02', 'XXX03', 'XXX04',
      'XXX05', 'XXX06', 'XXX07', 'XXX08', 'XXX09']}
)


我尝试用下面的逻辑创建一个单列datetime
对于每个连续的项目(从第二个开始),生成start_dtend_dt之间的日期时间。分钟间隔应是随机的,并应遵守range_of_minutes。这些项目是唯一的,可以是任何数字(奇数或偶数)。
预期的输出如下所示(仅前两列):

item             datetime **datetimes are monotonic and range is respected**
0  XXX01  26/07/2023 06:00:00                           NaN
1  XXX02  26/07/2023 06:17:34                         15,52
2  XXX03  26/07/2023 06:53:55                         36,35
3  XXX04  26/07/2023 08:05:15                         71,33
4  XXX05  26/07/2023 09:54:10                        108,92
5  XXX06  26/07/2023 11:08:20                         74,17
6  XXX07  26/07/2023 11:30:20                            22
7  XXX08  26/07/2023 14:07:05                        156,75
8  XXX09  26/07/2023 14:45:08                         38,05


我绝望的尝试:

import random

def r_interval():
    return random.randint(min(range_of_minutes), max(range_of_minutes))

df.loc[0, "datetime"] = pd.to_datetime(start_dt)

df["datetime"] = pd.to_datetime(start_dt) + [r_interval() + dt.shift() for dt in df["datetime"][1:]]


任何帮助将是非常感谢的家伙。

2nbm6dog

2nbm6dog1#

这是一个超级直接的方法,有点愚蠢,但如果你没有很多数据要处理,它工作得很好,并给出诚实的随机结果,易于编码。另一方面,随着数据的增加,它将变得缓慢。
如果我理解正确的话,问题是压缩时间间隔,使它们仍然是随机的,但在一定的边界上。
我的建议是生成随机区间,检查它们是否适合大区间,如果不适合-一遍又一遍地生成。

%%time

import numpy as np
import random

def r_interval():
    return random.randint(min(range_of_minutes), max(range_of_minutes))

def get_time_intervals(length, max_diff):
    attempt = 0
    while True:
        intervals = np.cumsum([0] + [r_interval() for _ in range(length - 1)])
        if intervals[-1] - intervals[0] <= max_diff:
            return intervals, attempt + 1
        attempt += 1

intervals, number_of_attempts = get_time_intervals(
    df.shape[0],
    int((end_dt - start_dt).seconds / 60)
)
df["datetime"] = [start_dt + datetime.timedelta(0, int(interval) * 60) for interval in intervals]

print(f"number of attempts: {number_of_attempts}")
print(df)

字符串
输出量:

number of attempts: 16
    item            datetime
0  XXX01 2023-07-26 06:00:00
1  XXX02 2023-07-26 06:36:00
2  XXX03 2023-07-26 07:48:00
3  XXX04 2023-07-26 08:37:00
4  XXX05 2023-07-26 11:53:00
5  XXX06 2023-07-26 12:35:00
6  XXX07 2023-07-26 13:06:00
7  XXX08 2023-07-26 13:40:00
8  XXX09 2023-07-26 14:00:00
CPU times: user 6.34 ms, sys: 1.78 ms, total: 8.12 ms
Wall time: 8.89 ms

axkjgtzd

axkjgtzd2#

使用列表理解立即创建列:

offsets = [0]+sorted(random.randint(min(range_of_minutes), max(range_of_minutes)) for _ in range(df.shape[0]-1))
df["datetime"] = [pd.to_datetime(start_dt)+pd.DateOffset(minutes=o) for o in offsets]

>>> df
    item            datetime
0  XXX01 2023-07-26 06:00:00
1  XXX02 2023-07-26 06:23:00
2  XXX03 2023-07-26 07:36:00
3  XXX04 2023-07-26 07:57:00
4  XXX05 2023-07-26 07:59:00
5  XXX06 2023-07-26 08:37:00
6  XXX07 2023-07-26 09:01:00
7  XXX08 2023-07-26 09:11:00
8  XXX09 2023-07-26 09:19:00

字符串

相关问题