pandas 如何调节lambda计算

yv5phkfx  于 2023-03-16  发布在  其他
关注(0)|答案(1)|浏览(129)

我只想对匹配特定列名模式(特别是字符串末尾有'vv')的列执行基于row=的操作。
我的数据示例:

df.head()
per1-vh per1-vv per2-vh per2-vv per3-vh ... per19-vv    per20-vh    per20-vv    per23-vh    per23-vv
0   0.038960    0.151950    0.025226    0.045487    0.068463    ... 0.193544    0.025932    0.064348    0.008332    0.168142
1   0.044579    0.198568    0.028740    0.062431    0.019059    ... 0.168563    0.018869    0.108869    0.002971    0.031542
2   0.037556    0.075178    0.022924    0.122599    0.040780    ... 0.052556    0.026983    0.048267    0.005766    0.013224
3   0.056599    0.110329    0.051889    0.064278    0.022659    ... 0.096915    0.032442    0.089093    0.014281    0.080128
4   0.029285    0.118285    0.097123    0.169384    0.006140    ... 0.029767    0.023235    0.092769    0.007135    0.068446

我试过:

def calculate_variance(x):
    x.drop('target', axis=1)
    return x.var(axis=1)

for row in data_df:
    df = data_df.assign(Var_vv = lambda row: calculate_variance(row) if row.columns in ['vv'])

其产生:
语法错误:“if”表达式后应为“else”
预期结果是在数据集末尾看到新列“Var”。

s5a0g9ez

s5a0g9ez1#

使用DataFrame.filter通过带有$的正则表达式过滤列名的最后一个值,如果以vv结尾的值与列target不匹配,则删除.drop

df = df.assign(Var_vv = df.filter(regex='vv$').var(axis=1))
print (df)

    per1-vh   per1-vv   per2-vh   per2-vv   per3-vh  per19-vv  per20-vh  \
0  0.038960  0.151950  0.025226  0.045487  0.068463  0.193544  0.025932   
1  0.044579  0.198568  0.028740  0.062431  0.019059  0.168563  0.018869   
2  0.037556  0.075178  0.022924  0.122599  0.040780  0.052556  0.026983   
3  0.056599  0.110329  0.051889  0.064278  0.022659  0.096915  0.032442   
4  0.029285  0.118285  0.097123  0.169384  0.006140  0.029767  0.023235   

   per20-vv  per23-vh  per23-vv    Var_vv  
0  0.064348  0.008332  0.168142  0.004322  
1  0.108869  0.002971  0.031542  0.004903  
2  0.048267  0.005766  0.013224  0.001626  
3  0.089093  0.014281  0.080128  0.000301  
4  0.092769  0.007135  0.068446  0.002759

您的解决方案:

def calculate_variance(x):
    #x.drop('target', axis=1) - no assign, so ouput not processed later
    return x.var(axis=1)

df = df.assign(Var_vv = lambda x: calculate_variance(x.filter(regex='vv$')))
print (df)
    per1-vh   per1-vv   per2-vh   per2-vv   per3-vh  per19-vv  per20-vh  \
0  0.038960  0.151950  0.025226  0.045487  0.068463  0.193544  0.025932   
1  0.044579  0.198568  0.028740  0.062431  0.019059  0.168563  0.018869   
2  0.037556  0.075178  0.022924  0.122599  0.040780  0.052556  0.026983   
3  0.056599  0.110329  0.051889  0.064278  0.022659  0.096915  0.032442   
4  0.029285  0.118285  0.097123  0.169384  0.006140  0.029767  0.023235   

   per20-vv  per23-vh  per23-vv    Var_vv  
0  0.064348  0.008332  0.168142  0.004322  
1  0.108869  0.002971  0.031542  0.004903  
2  0.048267  0.005766  0.013224  0.001626  
3  0.089093  0.014281  0.080128  0.000301  
4  0.092769  0.007135  0.068446  0.002759

相关问题