Python -将周末值前推到星期一

xdnvmnnf  于 2023-03-21  发布在  Python
关注(0)|答案(5)|浏览(240)

bounty将于明天到期。回答此问题可获得+50的声望奖励。Evan希望引起更多人关注此问题:我想知道如何滚动/合并(总和)周末值到下一个星期一与日期时间索引。所以,有没有周末日期的数据框架。

我有一个 Dataframe (称为df),如下所示:

我尝试获取所有周末的“成交量”值(列“WEEKDAY”= 5(星期六)或6(星期日)的值),并将它们与随后的星期一(WEEKDAY=0)相加。
我尝试了几种方法,但都没有真正奏效,以最后三行为例:

我期待的是这样的:

要重现问题:

!wget https://raw.githubusercontent.com/brunodifranco/TCC/main/volume_por_dia.csv

df = pd.read_csv('volume_por_dia.csv').sort_values('Datas',ascending=True)
df['Datas'] = pd.to_datetime(df['Datas'])
df = df_volume_noticias.set_index('Datas')
df['WEEKDAY'] = df.index.dayofweek
df
zxlwwiss

zxlwwiss1#

这可以使用pd.shift解决您的问题。

import pandas as pd
df['prior_volume'] = df.Volume.shift(1)
df['prior_volume2'] = df.Volume.shift(2)
df.loc[df['WEEKDAY'] == 0, 'Volume'] = df.loc[df['WEEKDAY'] == 0, 'prior_volume'] + \
    df.loc[df['WEEKDAY'] == 0, 'prior_volume2'] + \
    df.loc[df['WEEKDAY'] == 0, 'Volume']
df = df[df['WEEKDAY'].isin(range(5))]
df = df[['Volume', 'WEEKDAY']]
df.head(10)

其产生:

xghobddn

xghobddn2#

我使用.groupby来解决这个问题。

import pandas as pd

df = pd.read_csv('volume_por_dia.csv')
df['Datas'] = pd.to_datetime(df['Datas'])
df['WEEKDAY'] = df['Datas'].dt.dayofweek
df['index'] = df['Datas']

# Group df by date, setting frequency as week 
#(beginning Tue - so that Sat and Sun will be added to the next Mon)
df_group = df.groupby([pd.Grouper(key = 'Datas', freq='W-MON'), \
 'WEEKDAY', 'index']).agg({'Volume': 'sum'})

# In each group, add days 5, 6 (Sat and Sun) to day 0 (Mon)
df_group.loc[(slice(None), 0), 'Volume'] += \
df_group.loc[(slice(None), [5, 6]), 'Volume'].groupby(level=0).sum()

# In the grouped data, remove Sat and Sun
df_group = df_group.reset_index()
df_group = df_group[df_group['WEEKDAY'] != 5]
df_group = df_group[df_group['WEEKDAY'] != 6]

# Remove volume data from original df, and merge with volume from df_group 
df = df.drop(['Volume'], axis=1)
df = pd.merge(df,df_group[['index','Volume']],on='index', how='left')
df = df.dropna(subset=['Volume'])
df = df.drop(['index'], axis=1)

# Optional: sort dates in ascending order
df = df.sort_values(by=['Datas'])

print (df)
fzwojiic

fzwojiic3#

您可以简单地循环行并从周五开始累积交易量,并更新周日交易量中的值。然后,只需删除周五和周六的行。

values = df.values

volume_accumulated = 0
for idx, row in enumerate(values):
  if row[1] in (5, 6):
    volume_accumulated += row[0]
  elif row[1] == 0:
    volume_accumulated += row[0]
    df["Volume"][idx] = volume_accumulated
  else:
    volume_accumulated = 0

df = df[~df["WEEKDAY"].isin([5, 6])]
t0ybt7op

t0ybt7op4#

输入:

!wget https://raw.githubusercontent.com/brunodifranco/TCC/main/volume_por_dia.csv

import pandas as pd
import numpy as np

df = pd.read_csv('volume_por_dia.csv').sort_values('Datas',ascending=True)
df['Datas'] = pd.to_datetime(df['Datas'])
df.set_index('Datas', inplace=True)
df['WEEKDAY'] = df.index.dayofweek

我假设索引日期是排序的,Datas索引是唯一的,并且没有缺失的日期。我不能做的一些假设是:

  • 对于每个星期一,我有完整的前一个周末的卷,这可能是错误的,因为 Dataframe 可能在星期天开始,我将有一个不完整的周末卷;
  • 对于每个周末,我将有下一个星期一,这可能是错误的,因为 Dataframe 可能在星期六或星期天完成。

由于这些原因,在计算周末交易量之前,我首先提取第一个星期六和最后一个星期一的日期:

first_saturday = df.index[df.WEEKDAY==5][0]
last_monday = df.index[df.WEEKDAY==0][-1]

现在我可以提取周末卷,确保我总是有周六周日对,并且对于这些对中的每一个, Dataframe 中存在下一个星期一:

df_weekend = df.loc[
    (df.WEEKDAY.isin([5,6]))&
    (df.index<=last_monday)&
    (df.index>=first_saturday)
]
df_weekend

现在,因为我有几个星期六和星期天的卷,我可以用下面的方法计算总和:

weekend_volumes = pd.Series(
    df_weekend.Volume.values.reshape(-1,2).sum(axis=1), #sum of volume couples
    index = df_weekend.index[1::2]+pd.Timedelta("1d"), #date of the following monday
    name="weekend_volume"
).reindex(df.index).fillna(0) #zero weekend-volume for days that are not mondays 
weekend_volumes

最后,将周末卷添加到起始卷:

df["Volume"] = df.Volume+weekend_volumes

我在下面附上df的最后25行:

# 2022-02-18    16.0    4
# 2022-02-19    2.0     5
# 2022-02-20    1.0     6
# 2022-02-21    10.0    0
# 2022-02-22    43.0    1
# 2022-02-23    36.0    2
# 2022-02-24    38.0    3
# 2022-02-25    28.0    4
# 2022-02-26    5.0     5
# 2022-02-27    3.0     6
# 2022-02-28    14.0    0
# 2022-03-01    10.0    1
# 2022-03-02    16.0    2
# 2022-03-03    18.0    3
# 2022-03-04    11.0    4
# 2022-03-05    8.0     5
# 2022-03-06    2.0     6
# 2022-03-07    32.0    0
# 2022-03-08    18.0    1
# 2022-03-09    32.0    2
# 2022-03-10    24.0    3
# 2022-03-11    18.0    4
# 2022-03-12    4.0     5
# 2022-03-13    1.0     6
# 2022-03-14    10.0    0
dwthyt8l

dwthyt8l5#

在此添加2个解决方案:
1.使用pd.shift(由Lukas Hestermeyer早些时候指出;我已经添加了一个简化版本)
1.使用滚动窗口(这实际上是一行程序)
这两种解决方案都假设;

  1. Dates按升序排序(如果不是,则应在继续之前进行排序)
    1.每个周末(星期六和星期日)的记录后面都有一个星期一的记录。在丢失数据的情况下,需要增加额外的检查

数据准备

import pandas as pd
import numpy as np

# STEP 1: Create DF
Datas = [
    '2019-07-02',
    '2019-07-03',
    '2019-07-04',
    '2019-07-05',
    '2019-07-06',
    '2022-03-10',
    '2022-03-11',
    '2022-03-12',
    '2022-03-13',
    '2022-03-14'
]

Volume = [17, 30, 20, 21, 5, 24, 18, 4, 1, 5]
WEEKDAY = [1, 2, 3, 4, 5, 3, 4, 5, 6, 0]

dic = {'Datas': Datas, 'Volume': Volume, 'WEEKDAY': WEEKDAY}
df = pd.DataFrame(dic)

溶液1 [pd.shift]:

# STEP 1: add shifts
df['shift_1'] = df['Volume'].shift(1)
df['shift_2'] = df['shift_1'].shift(1)

# STEP 2: sum Volume with shifts where weekday==0
cols_to_sum = ['Volume', 'shift_1', 'shift_2']
df['Volume'] = df[['WEEKDAY'] + cols_to_sum].apply(lambda x: int(x[1]) if x[0] else int(x[1] + x[2] + x[3]), axis=1)
df = df.drop(['shift_1', 'shift_2'], axis=1)
df

方案二[滚动窗口]:

# use rolling window of size 3 to sum where weekday == 0
df['Volume'] = np.where(
    df['WEEKDAY'] == 0,  
    df['Volume'].rolling(window=3, center=False).sum(), 
    df['Volume']
)
df

相关问题