pandas 从旧 Dataframe 创建具有新大小的新 Dataframe

dbf7pr2w  于 2023-03-28  发布在  其他
关注(0)|答案(3)|浏览(144)

我有一个df_train如下:

X1  
01-01-2020 | 1     
01-02-2020 | 2     
01-03-2020 | 3      
01-04-2020 | 4

现在我想构建另一个带有日期时间索引的df
我将得到datetime索引:

future_dates = pd.date_range(df_train.index.max(), periods=12, freq='M')

我想得到一个新的df,它在开始时有一个df_train的副本,在其余的日期中,我们将得到df_train的平均值。
预期结果:

X1  
  01-05-2020 | 1     
  01-06-2020 | 2     
  01-07-2020 | 3      
  01-08-2020 | 4 
  01-09-2020 | 2.5     
  01-10-2020 | 2.5     
  01-11-2020 | 2.5      
  01-12-2020 | 2.5 
  01-01-2021 | 2.5     
  01-02-2021 | 2.5     
  01-03-2021 | 2.5      
  01-04-2021 | 2.5
llycmphe

llycmphe1#

转换索引to_datetime(如果尚未转换):

df_train.index = pd.to_datetime(df_train.index, dayfirst=True)

然后尝试使用偏移索引MonthBeginMS
x一个一个一个一个x一个一个二个x
然后创建一个新帧,并根据df_train的长度替换第一个值:

new_df = pd.DataFrame({'X1': df_train['X1'].mean()}, index=future_dates)
new_df.iloc[:df_train.shape[0], new_df.columns.get_loc('X1')] = df_train['X1'].values

new_df

X1
2020-05-01  1.0
2020-06-01  2.0
2020-07-01  3.0
2020-08-01  4.0
2020-09-01  2.5
2020-10-01  2.5
2020-11-01  2.5
2020-12-01  2.5
2021-01-01  2.5
2021-02-01  2.5
2021-03-01  2.5
2021-04-01  2.5

或者从列表解析构建:

new_df = pd.DataFrame({
    'X1': [*df_train['X1'],
           *(len(future_dates) - len(df_train)) * [df_train['X1'].mean()]]
}, index=future_dates)

new_df

X1
2020-05-01  1.0
2020-06-01  2.0
2020-07-01  3.0
2020-08-01  4.0
2020-09-01  2.5
2020-10-01  2.5
2020-11-01  2.5
2020-12-01  2.5
2021-01-01  2.5
2021-02-01  2.5
2021-03-01  2.5
2021-04-01  2.5

然后用DatetimeIndex.strftime恢复原来的格式:

new_df.index = new_df.index.strftime('%d-%m-%Y')
X1
01-05-2020  1.0
01-06-2020  2.0
01-07-2020  3.0
01-08-2020  4.0
01-09-2020  2.5
01-10-2020  2.5
01-11-2020  2.5
01-12-2020  2.5
01-01-2021  2.5
01-02-2021  2.5
01-03-2021  2.5
01-04-2021  2.5

大家一起:

import pandas as pd

df_train = pd.DataFrame({
    'X1': {'01-01-2020': 1, '01-02-2020': 2, '01-03-2020': 3, '01-04-2020': 4}
})

df_train.index = pd.to_datetime(df_train.index, dayfirst=True)
future_dates = pd.date_range(
    df_train.index.max() + pd.tseries.offsets.MonthBegin(1),
    periods=12,
    freq='MS'
)
new_df = pd.DataFrame({'X1': df_train['X1'].mean()}, index=future_dates)
new_df.iloc[:df_train.shape[0], new_df.columns.get_loc('X1')] = \
    df_train['X1'].values
new_df.index = new_df.index.strftime('%d-%m-%Y')

print(new_df)
l5tcr1uw

l5tcr1uw2#

  • set_index()个现有行
  • 为新行创建 Dataframe
  • concat()他们
import io

df_train = pd.read_csv(io.StringIO("""             X1  
01-01-2020 | 1     
01-02-2020 | 2     
01-03-2020 | 3      
01-04-2020 | 4  """), sep="|")
df_train = df_train.set_index(pd.to_datetime(df_train.index,  format="%d-%m-%Y "))
df_train.columns = [c.strip() for c in df_train.columns]

future_dates = pd.date_range(df_train.index.max(), periods=12, freq='M')
pd.concat([
    df_train.set_index(future_dates[0:len(df_train)]),
    pd.DataFrame(index=future_dates[len(df_train):]).assign(X1=df_train["X1"].mean())
])
g6ll5ycj

g6ll5ycj3#

下面是另一种方法:

df = df.reindex(pd.date_range(df.index.min(),periods=12,freq='MS'),fill_value=df['X1'].mean())

df = df.set_axis(df.index.shift(4))

旧答案:

future_dates = pd.date_range(df.index.max(), periods=12, freq='M') + pd.tseries.offsets.MonthBegin()
df2 = pd.DataFrame(index = future_dates).assign(X1 = pd.Series(df['X1'].to_numpy(),index=future_dates[0:4])).fillna(df.mean())

输出:

X1
2020-05-01  1.0
2020-06-01  2.0
2020-07-01  3.0
2020-08-01  4.0
2020-09-01  2.5
2020-10-01  2.5
2020-11-01  2.5
2020-12-01  2.5
2021-01-01  2.5
2021-02-01  2.5
2021-03-01  2.5
2021-04-01  2.5

相关问题