R语言用mgcv进行变量选择

xwbd5t1u 于 2023-01-22 发布在其他

关注(0)|答案(2)|浏览(192)

在R中是否有一种方法可以自动选择GAM的变量，类似于step？我已经阅读了step.gam和selection.gam的文档，但是我还没有看到一个答案，也没有看到可以工作的代码。另外，我尝试了method= "REML"和select = TRUE，但是都没有从模型中删除不重要的变量。
我已经建立了一个理论，我可以创建一个步骤模型，然后使用这些变量来创建GAM，但这似乎在计算上效率不高。
示例：

library(mgcv)

set.seed(0)
dat <- data.frame(rsp = rnorm(100, 0, 1), 
                  pred1 = rnorm(100, 10, 1), 
                  pred2 = rnorm(100, 0, 1), 
                  pred3 = rnorm(100, 0, 1), 
                  pred4 = rnorm(100, 0, 1))

model <- gam(rsp ~ s(pred1) + s(pred2) + s(pred3) + s(pred4),
             data = dat, method = "REML", select = TRUE)

summary(model)

#Family: gaussian 
#Link function: identity 

#Formula:
#rsp ~ s(pred1) + s(pred2) + s(pred3) + s(pred4)

#Parametric coefficients:
#            Estimate Std. Error t value Pr(>|t|)
#(Intercept)  0.02267    0.08426   0.269    0.788

#Approximate significance of smooth terms:
#            edf Ref.df     F p-value  
#s(pred1) 0.8770      9 0.212  0.1174  
#s(pred2) 1.8613      9 0.638  0.0374 *
#s(pred3) 0.5439      9 0.133  0.1406  
#s(pred4) 0.4504      9 0.091  0.1775  
---
#Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

#R-sq.(adj) =  0.0887   Deviance explained = 12.3%
#-REML = 129.06  Scale est. = 0.70996   n = 100

来源：https://stackoverflow.com/questions/38571145/variable-selection-with-mgcv

2条答案

按热度按时间

qnyhuwrf1#

Marra和Wood（2011年，计算统计和数据分析55;2372-2387）比较了GAM中的各种特征选择方法。他们得出结论，平滑度选择过程中的附加惩罚项给出了最佳结果。这可以在mgcv：：gam（）中通过使用select = TRUE参数/设置或以下任何变体来激活：

model <- gam(rsp ~ s(pred1,bs="ts") + s(pred2,bs="ts") + s(pred3,bs="ts") + s(pred4,bs="ts"), data = dat, method = "REML")
model <- gam(rsp ~ s(pred1,bs="cr") + s(pred2,bs="cr") + s(pred3,bs="cr") + s(pred4,bs="cr"),
             data = dat, method = "REML",select=T)
model <- gam(rsp ~ s(pred1,bs="cc") + s(pred2,bs="cc") + s(pred3,bs="cc") + s(pred4,bs="cc"),
             data = dat, method = "REML")
model <- gam(rsp ~ s(pred1,bs="tp") + s(pred2,bs="tp") + s(pred3,bs="tp") + s(pred4,bs="tp"), data = dat, method = "REML")

赞(0）回复(0）举报 2023-01-22

2uluyalo2#

除了在调用函数gam时指定select = TRUE之外，还可以增加参数gamma的值以获得更强的惩罚，例如，我们生成了一些数据：

library("mgcv")
set.seed(2) 
dat <- gamSim(1, n=400, dist="normal", scale=5)
## Gu & Wahba 4 term additive model

我们用“标准”惩罚和变量选择拟合GAM：

b <- gam(y ~ s(x0) + s(x1) + s(x2) + s(x3), data=dat, method = "REML")
summary(b)
##
## Family: gaussian 
## Link function: identity 
##
## Formula:
## y ~ s(x0) + s(x1) + s(x2) + s(x3)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    7.890      0.246   32.07   <2e-16 ***
## ---
## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## 
## Approximate significance of smooth terms:
##         edf Ref.df      F  p-value    
## s(x0) 1.363  1.640  0.804   0.3174    
## s(x1) 1.681  2.088 11.309 1.35e-05 ***
## s(x2) 5.931  7.086 16.240  < 2e-16 ***
## s(x3) 1.002  1.004  4.102   0.0435 *  
## ---
## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
##
## R-sq.(adj) =  0.253   Deviance explained = 27.1%
## -REML = 1212.5  Scale est. = 24.206    n = 400
par(mfrow = c(2, 2)) 
plot(b)

我们用更强的惩罚和变量选择来拟合GAM：

b2 <- gam(y ~ s(x0) + s(x1) + s(x2) + s(x3), data=dat, method = "REML", select = TRUE, gamma = 7)
## summary(b2)

## Family: gaussian 
## Link function: identity 
## 
## Formula:
## y ~ s(x0) + s(x1) + s(x2) + s(x3)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   7.8898     0.2604    30.3   <2e-16 ***
## ---
## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
##
## Approximate significance of smooth terms:
##             edf Ref.df     F p-value    
## s(x0) 5.330e-05      9 0.000  0.1868    
## s(x1) 5.427e-01      9 0.967 7.4e-05 ***
## s(x2) 1.549e+00      9 6.210 < 2e-16 ***
## s(x3) 6.155e-05      9 0.000  0.0812 .  
## ---
## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## 
## R-sq.(adj) =  0.163   Deviance explained = 16.7%
## -REML = 179.46  Scale est. = 27.115    n = 400
plot(b2)

根据文档，增加gamma的值会生成更平滑的模型，因为它会增加GCV或UBRE/AIC准则中的有效自由度。
因此，一个可能的不利因素是，* 所有 * 非线性效应将向线性效应收缩，而 * 所有 * 线性效应将向零收缩。这也是我们在上面的图和输出中观察到的：随着gamma值的升高，一些效应实际上被抵消（edf值接近0，F值为0），而其他效应更接近线性（edf值更接近1）。

赞(0）回复(0）举报 2023-01-22

我来回答

R语言用mgcv进行变量选择

2条答案

相关问题

热门标签

最新问答

R语言 用mgcv进行变量选择

2条答案

相关问题

热门标签

最新问答

R语言用mgcv进行变量选择