python 计算残留效应

hgqdbh6s  于 2023-02-18  发布在  Python
关注(0)|答案(2)|浏览(159)

我想计算电视广告GRP数据的结转效应。我的输入数据如下所示:

Variable       Date  Causal  Half_Life
0     TV Model 2016-01-10       0          4
1     TV Model 2016-01-17       0          4
2     TV Model 2016-01-24       0          4
3     TV Model 2016-01-31     100          4
4     TV Model 2016-02-07     110          4
5     TV Model 2016-02-14      89          4
6     TV Model 2016-02-21      57          4
7     TV Model 2016-02-28      90          4
8   TV General 2016-01-10       0          4
9   TV General 2016-01-17       0          4
10  TV General 2016-01-24       0          4
11  TV General 2016-01-31      30          4
12  TV General 2016-02-07      32          4
13  TV General 2016-02-14      42          4
14  TV General 2016-02-21      39          4
15  TV General 2016-02-28      55          4

我想根据以下条件计算一个新列df ['Adstock']:
如果列df.变量的组的第一行,则df.Adstock = df.Causal如果不是组的第一行,则df. Adstock = df.Causal + 0.5**(1/df.Half_life)*df.Adstock来自前一行。
我使用下面的代码:

import pandas as pd
import numpy as np
import numpy.random as random
import statsmodels.api as sm
import statsmodels.tsa as tsa
import statsmodels.formula.api as smf
import datetime

df = pd.read_excel('RC Data.xlsx')

df['Adstock'] = 0

df['Adstock'] = np.where(df['Variable'] == df['Variable'].shift(1), df['Adstock'].shift(1)*(0.5**(1/df['Half_Life'])) + df['Causal'], df['Causal'])

我得到的输出如下所示:

Variable       Date  Causal  Half_Life  Adstock
0     TV Model 2016-01-10       0          4      0.0
1     TV Model 2016-01-17       0          4      0.0
2     TV Model 2016-01-24       0          4      0.0
3     TV Model 2016-01-31     100          4    100.0
4     TV Model 2016-02-07     110          4    110.0
5     TV Model 2016-02-14      89          4     89.0
6     TV Model 2016-02-21      57          4     57.0
7     TV Model 2016-02-28      90          4     90.0
8   TV General 2016-01-10       0          4      0.0
9   TV General 2016-01-17       0          4      0.0
10  TV General 2016-01-24       0          4      0.0
11  TV General 2016-01-31      30          4     30.0
12  TV General 2016-02-07      32          4     32.0
13  TV General 2016-02-14      42          4     42.0
14  TV General 2016-02-21      39          4     39.0
15  TV General 2016-02-28      55          4     55.0

但所需的输出应该如下所示:

Variable       Date  Causal  Half_Life     Adstock
0     TV Model 2016-01-10       0          4    0.000000
1     TV Model 2016-01-17       0          4    0.000000
2     TV Model 2016-01-24       0          4    0.000000
3     TV Model 2016-01-31     100          4  100.000000
4     TV Model 2016-02-07     110          4  194.089642
5     TV Model 2016-02-14      89          4  252.209284
6     TV Model 2016-02-21      57          4  269.081883
7     TV Model 2016-02-28      90          4  316.269991
8   TV General 2016-01-10       0          4    0.000000
9   TV General 2016-01-17       0          4    0.000000
10  TV General 2016-01-24       0          4    0.000000
11  TV General 2016-01-31      30          4   30.000000
12  TV General 2016-02-07      32          4   57.226892
13  TV General 2016-02-14      42          4   90.121889
14  TV General 2016-02-21      39          4  114.783173
15  TV General 2016-02-28      55          4  151.520759

请帮帮我。

chhkpiq4

chhkpiq41#

这是我的解决方案,我认为很难将其矢量化

l=[]
for x , y in df.groupby('Variable',sort=False):
    #print(y)
    l1=[]
    for s,t in y.iterrows():
        if len(l1)==0:
            l1.append(t['Causal'])
        else:
            l1.append(t['Causal'] + 0.5**(1/t['Half_Life'])*l1[-1])
    l.extend(l1)
df['New']=l
df
Out[982]: 
     Variable        Date  Causal  Half_Life         New
0     TVModel  2016-01-10       0          4    0.000000
1     TVModel  2016-01-17       0          4    0.000000
2     TVModel  2016-01-24       0          4    0.000000
3     TVModel  2016-01-31     100          4  100.000000
4     TVModel  2016-02-07     110          4  194.089642
5     TVModel  2016-02-14      89          4  252.209284
6     TVModel  2016-02-21      57          4  269.081883
7     TVModel  2016-02-28      90          4  316.269991
8   TVGeneral  2016-01-10       0          4    0.000000
9   TVGeneral  2016-01-17       0          4    0.000000
10  TVGeneral  2016-01-24       0          4    0.000000
11  TVGeneral  2016-01-31      30          4   30.000000
12  TVGeneral  2016-02-07      32          4   57.226892
13  TVGeneral  2016-02-14      42          4   90.121889
14  TVGeneral  2016-02-21      39          4  114.783173
15  TVGeneral  2016-02-28      55          4  151.520759
p1tboqfb

p1tboqfb2#

def decay(df, row_id):
    causal_value=df._get_value(row_id,'Causal')
    half_life = df._get_value(row_id, "Half_Life")
    ad_stock_value = df._get_value(row_id - 1, "adstock_value")
    val = causal_value+0.5 ** (1 / half_life) * ad_stock_value
    return val

def adstock(df):
    #adding new col "adstock_value"
    df.loc[:, 'adstock_value'] = np.nan
    visited = set()

    for i in range(0, len(df)):

        var = df._get_value(i, "Variable")
        if var in visited:
            df.loc[i, "adstock_value"] = decay(df, i)
        else:
            visited.add(var)
            df.loc[i, "adstock_value"] = df._get_value(i, "Causal")

        #print(df.iloc[i])

adstock(df)

相关问题