Python:我的代码很慢,使用for循环和Pandas Dataframe

iugsix8n  于 2023-05-15  发布在  Python
关注(0)|答案(3)|浏览(144)

我得到了下面的代码,它工作正常,但需要太多的时间来执行(> 30 s):

df['Date_engagement'] = np.nan
for i in range(df.shape[0]):
if not(pd.isna(df.loc[i,'Engagement'])):
    df.loc[i, 'Date_engagement'] = np.busday_offset(
        df.loc[i, 'Date_réception'].date(),
        df.loc[i,'Engagement'], roll='backward', busdaycal=france_bdc
    )

我想这是因为我循环了Pandas数据框,我想知道是否有一种方法可以实现相同的结果,而不使用任何循环。
基本上,此代码所做的是创建***'Date_engagement'列,如果'Engagement'不为空,则该列将是将'Engagement'天添加到'Date_reception'的结果。我使用'np.busday_offset()'***来避开节假日。
提前感谢您的帮助…
我尝试使用Pandas的 Dataframe 功能(如***DataFrame.apply()***),但无法找到如何正确地执行以获得与解释相同的结果。

mznpcxlj

mznpcxlj1#

假设输入 Dataframe :

import pandas as pd
import numpy as np

df = pd.DataFrame({'Date_réception': ['2023-05-12', '2023-05-13'],
                   'Engagement': [10, np.nan]})
print(df)

# Output
  Date_réception  Engagement
0     2023-05-12        10.0
1     2023-05-13         NaN

你可以向量化你的代码:

france_bdc = np.busdaycalendar(weekmask='1111100', holidays=['2023-05-18'])
m = df['Engagement'].notna()

df.loc[m, 'Date_engagement'] = (
    np.busday_offset(df.loc[m, 'Date_réception'].values.astype('datetime64[D]'),
                     df.loc[m, 'Engagement'], roll='backward', busdaycal=france_bdc)
)

输出:

>>> df
  Date_réception  Engagement Date_engagement
0     2023-05-12        10.0      2023-05-29
1     2023-05-13         NaN             NaT
qvtsj1bj

qvtsj1bj2#

也许这会起作用:

mask = df['Engagement'].notnull()

df.loc[mask, 'Date_engagement'] = np.busday_offset(
    df.loc[mask, 'Date_réception'].dt.date,
    df.loc[mask, 'Engagement'],
    roll='backward',
    busdaycal=france_bdc
)
2cmtqfgy

2cmtqfgy3#

虽然不是最有效的方法,但你可以试试:

dt_arr = np.busday_offset(
            df["Date_réception"].to_numpy().astype("datetime64[D]"),
            df["Engagement"].fillna(0), roll="backward", busdaycal=france_bdc)

df["Date_engagement"] = pd.Series(dt_arr).where(df["Engagement"].notna())

#620 µs ± 15.9 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

示例:

np.random.seed(1)

df = pd.DataFrame({
    "Date_réception": pd.date_range("20230401", periods=30, freq="W"),
    "Engagement": np.random.choice(np.append(np.random.randint(5, 26, 25), [np.nan]),
                                   size=30, replace=True)}
)

france_bdc = np.busdaycalendar(
                holidays=["2023-01-01", "2023-04-17", "2023-05-01",
                          "2023-05-08", "2023-07-14", "2023-08-15",
                          "2023-11-01", "2023-11-11", "2023-12-25"]
)

输出:

print(df)

   Date_réception  Engagement Date_engagement
0      2023-04-02        23.0      2023-05-05
1      2023-04-09        23.0      2023-05-15
2      2023-04-16        21.0      2023-05-18
3      2023-04-23        10.0      2023-05-09
4      2023-04-30        23.0      2023-06-02
5      2023-05-07        10.0      2023-05-22
6      2023-05-14        19.0      2023-06-08
7      2023-05-21        18.0      2023-06-14
8      2023-05-28        21.0      2023-06-26
9      2023-06-04        21.0      2023-07-03
10     2023-06-11        20.0      2023-07-07
11     2023-06-18        19.0      2023-07-13
12     2023-06-25         NaN             NaT
13     2023-07-02        16.0      2023-07-25
14     2023-07-09        10.0      2023-07-24
15     2023-07-16        10.0      2023-07-28
16     2023-07-23         5.0      2023-07-28
17     2023-07-30         9.0      2023-08-10
18     2023-08-06        18.0      2023-08-31
19     2023-08-13        25.0      2023-09-18
20     2023-08-20        23.0      2023-09-20
21     2023-08-27         6.0      2023-09-04
22     2023-09-03         NaN             NaT
23     2023-09-10         5.0      2023-09-15
24     2023-09-17        20.0      2023-10-13
25     2023-09-24        13.0      2023-10-11
26     2023-10-01        10.0      2023-10-13
27     2023-10-08        15.0      2023-10-27
28     2023-10-15        10.0      2023-10-27
29     2023-10-22         NaN             NaT

相关问题