pandas 通过重复/缩放现有值将时间序列数据外推到未来

xfyts7mz  于 2022-11-20  发布在  其他
关注(0)|答案(2)|浏览(125)

我有某一天的每小时耗电量数据。我想使用此数据来“预测”接下来几天的每小时耗电量。第二天的值应该是前一天同一小时的值乘以比例系数f(例如2)。
Dataframe df如下所示:

load_kWh
2021-01-01 00:00:00   1.0
2021-01-01 01:00:00   1.0
2021-01-01 02:00:00   1.0
2021-01-01 03:00:00   1.0
2021-01-01 04:00:00   1.0
2021-01-01 05:00:00   1.0
2021-01-01 06:00:00   1.0
2021-01-01 07:00:00   3.0
2021-01-01 08:00:00   3.0
2021-01-01 09:00:00   3.0
2021-01-01 10:00:00   3.0
2021-01-01 11:00:00   3.0
2021-01-01 12:00:00   3.0
2021-01-01 13:00:00   3.0
2021-01-01 14:00:00   3.0
2021-01-01 15:00:00   3.0
2021-01-01 16:00:00   3.0
2021-01-01 17:00:00   3.0
2021-01-01 18:00:00   3.0
2021-01-01 19:00:00   3.0
2021-01-01 20:00:00   1.0
2021-01-01 21:00:00   1.0
2021-01-01 22:00:00   1.0
2021-01-01 23:00:00   1.0

我希望输出 Dataframe df_ex看起来像这样:

load_kWh
2021-01-01 00:00:00   1.0
2021-01-01 01:00:00   1.0
2021-01-01 02:00:00   1.0
2021-01-01 03:00:00   1.0
2021-01-01 04:00:00   1.0
2021-01-01 05:00:00   1.0
2021-01-01 06:00:00   1.0
2021-01-01 07:00:00   3.0
2021-01-01 08:00:00   3.0
2021-01-01 09:00:00   3.0
2021-01-01 10:00:00   3.0
2021-01-01 11:00:00   3.0
2021-01-01 12:00:00   3.0
2021-01-01 13:00:00   3.0
2021-01-01 14:00:00   3.0
2021-01-01 15:00:00   3.0
2021-01-01 16:00:00   3.0
2021-01-01 17:00:00   3.0
2021-01-01 18:00:00   3.0
2021-01-01 19:00:00   3.0
2021-01-01 20:00:00   1.0
2021-01-01 21:00:00   1.0
2021-01-01 22:00:00   1.0
2021-01-01 23:00:00   1.0
2021-01-02 00:00:00   2.0
2021-01-02 01:00:00   2.0
2021-01-02 02:00:00   2.0
2021-01-02 03:00:00   2.0
2021-01-02 04:00:00   2.0
2021-01-02 05:00:00   2.0
2021-01-02 06:00:00   2.0
2021-01-02 07:00:00   6.0
2021-01-02 08:00:00   6.0
2021-01-02 09:00:00   6.0
2021-01-02 10:00:00   6.0
2021-01-02 11:00:00   6.0
2021-01-02 12:00:00   6.0
2021-01-02 13:00:00   6.0
2021-01-02 14:00:00   6.0
2021-01-02 15:00:00   6.0
2021-01-02 16:00:00   6.0
2021-01-02 17:00:00   6.0
2021-01-02 18:00:00   6.0
2021-01-02 19:00:00   6.0
2021-01-02 20:00:00   2.0
2021-01-02 21:00:00   2.0
2021-01-02 22:00:00   2.0
2021-01-02 23:00:00   2.0
2021-01-03 00:00:00   4.0
2021-01-03 01:00:00   4.0
2021-01-03 02:00:00   4.0
2021-01-03 03:00:00   4.0
2021-01-03 04:00:00   4.0
2021-01-03 05:00:00   4.0
2021-01-03 06:00:00   4.0
2021-01-03 07:00:00   12.0
2021-01-03 08:00:00   12.0
2021-01-03 09:00:00   12.0
2021-01-03 10:00:00   12.0
2021-01-03 11:00:00   12.0
2021-01-03 12:00:00   12.0
2021-01-03 13:00:00   12.0
2021-01-03 14:00:00   12.0
2021-01-03 15:00:00   12.0
2021-01-03 16:00:00   4.0
2021-01-03 17:00:00   4.0
2021-01-03 18:00:00   4.0
2021-01-03 19:00:00   4.0
2021-01-03 20:00:00   4.0
2021-01-03 21:00:00   4.0
2021-01-03 22:00:00   4.0
2021-01-03 23:00:00   4.0

我已经尝试了以下解决方案(如上定义的df):

import pandas as pd
import datetime

start = '2021-01-01 00:00'
end = '2021-01-03 23:00'
freq = 'H'

index = pd.date_range(start,
                      end,
                      freq=freq)

df_ex = df.reindex(index)

i = df_ex.index[0].day
f = 2.0
df_ex.loc[df_ex.index.day == i+1] = df_ex.loc[df_ex.index.day == i] * f

print(df_ex)

结果是:

load_kWh
2021-01-01 00:00:00   1.0
2021-01-01 01:00:00   1.0
2021-01-01 02:00:00   1.0
2021-01-01 03:00:00   1.0
2021-01-01 04:00:00   1.0
...                   ...
2021-01-03 19:00:00   NaN
2021-01-03 20:00:00   NaN
2021-01-03 21:00:00   NaN
2021-01-03 22:00:00   NaN
2021-01-03 23:00:00   NaN

我尝试在第一天之后的数据列填入值似乎失败。索引是DateTimeIndex。
任何关于如何解决这个问题的建议都将不胜感激!

6ljaweal

6ljaweal1#

要创建数据,您需要一次迭代一天。
假设原始数据至少有一整天的数据,那么您可以:

import pandas as pd
import itertools
import datetime as dt

start = "2021-01-01 00:00"
end = "2021-01-01 23:00"
freq = "H"

df = pd.DataFrame(
    {"load_kWh": itertools.chain([1.0] * 7, [3.0] * 13, [1.0] * 4)},
    index=pd.date_range(start, end, freq=freq),
)

def add_days_to_df(data: pd.DataFrame, number_of_days: int, k: float) -> pd.DataFrame:
    data = data.copy()
    for _ in range(number_of_days):
        day = data[-24:]
        day.index += dt.timedelta(days=1)
        day *= k
        data = pd.concat((data, day))
    return data

print(add_days_to_df(data=df, number_of_days=2, k=2.0))
2skhul33

2skhul332#

我设法得到了一个部分解决方案,它适用于几年而不是几天(复制/缩放前一年同一天的数据)。这只是一个部分解决方案,因为闰年还没有考虑在内。

def add_years_to_df(data: pd.DataFrame, target_year: int, k: float) -> pd.DataFrame:
base_year = data.index[0].year
i = base_year+1
add = data.copy()
for _ in range (base_year+1, target_year+1):
    add = k*add
    add.index = add.index.map(lambda t: t.replace(year=i))
    data = pd.concat((data, add))
    i += 1
return data

lp是输入 Dataframe ,如我的原始问题所述。target_year是数据要外推的年份。k是乘数。
要调用函数,请输入add_years_to_df(data=lp, target_year = 2030, k=1.1)
这将导致:

Datetime                load_kWh
2021-01-01 00:00:00     77.987500
2021-01-01 01:00:00     78.116667
2021-01-01 02:00:00     79.383333
2021-01-01 03:00:00     79.070833
2021-01-01 04:00:00     78.275000
...     ...
2030-12-31 19:00:00     247.373361
2030-12-31 20:00:00     74.889393
2030-12-31 21:00:00     71.883018
2030-12-31 22:00:00     73.101291
2030-12-31 23:00:00     72.438118

相关问题