是否有一个R函数可以从回归中删除P值大于5%的所有变量?

6ju8rftf  于 2023-01-06  发布在  其他
关注(0)|答案(1)|浏览(187)

最近,我遇到了一个问题,即从模型中剔除所有不重要的变量需要花费很多时间。我尝试编写一个函数,但我很乐意接受一些建议。最好的方法是,如果函数一个接一个地删除变量,总是删除P值最高的变量,直到所有变量都在5%上显著。
这就是我的“职能”:

x <- summary(model_test1)
x <- x$coefficients
x <- as.data.frame(x)
max_p <- function(x) {
  nameofmax <- rownames(which(x$`Pr(>|t|)` == max(x$`Pr(>|t|)`), arr.ind = TRUE))
  return(nameofmax)
}
iszxjhcz

iszxjhcz1#

首先,这是一个简单的(可以说是天真的)逐步减少的方法。肯定有更好的方法在那里,其中大部分是在统计课上教授的(高级或至少是“健壮”的课)。
但在此期间,试试这个。
为了让事情开始,我“决不”允许拦截被丢弃;这是一个决定,而且通常是一个安全的赌注,但可能会有一些用途,人们可以考虑删除它。当你达到这一点时,我建议你的工具箱中会有更多的资源来分析它。(所以我现在总是保留它。)

fun <- function(data, frm, threshold = 0.05, verbose = FALSE) {
  if (missing(frm)) frm <- reformulate(names(data)[-1], response = names(data)[1])
  while (TRUE) {
    if (verbose) print(frm)
    mdl <- lm(frm, data = data)
    coef <- summary(mdl)$coefficients
    if (verbose) print(coef)
    coef <- coef[rownames(coef) != "(Intercept)", ncol(coef)]
    drop <- which.max(coef)
    drop <- drop[coef[drop] > threshold]
    if (length(drop)) {
      if (verbose) message(paste("## drop:", names(drop), "=", round(coef[drop], 3)))
      frm <- drop.terms(terms(mdl), drop, keep.response = TRUE)
      attributes(frm) <- NULL # only to keep verbose printing clean
    } else break
  }
  list(formula = frm, model = mdl)
}

mtcars上的演示:

out$formula
# mpg ~ wt + qsec + am
summary(out$model)
# Call:
# lm(formula = frm, data = data)
# Residuals:
#     Min      1Q  Median      3Q     Max 
# -3.4811 -1.5555 -0.7257  1.4110  4.6610 
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)    
# (Intercept)   9.6178     6.9596   1.382 0.177915    
# wt           -3.9165     0.7112  -5.507 6.95e-06 ***
# qsec          1.2259     0.2887   4.247 0.000216 ***
# am            2.9358     1.4109   2.081 0.046716 *  
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Residual standard error: 2.459 on 28 degrees of freedom
# Multiple R-squared:  0.8497,  Adjusted R-squared:  0.8336 
# F-statistic: 52.75 on 3 and 28 DF,  p-value: 1.21e-11

如果你想知道它每一步都做了什么,那么就做verbose ly。

out <- fun(mtcars, mpg ~ ., verbose = TRUE)
# mpg ~ .
# <environment: 0x0000021eb97c5f00>
#                Estimate  Std. Error    t value   Pr(>|t|)
# (Intercept) 12.30337416 18.71788443  0.6573058 0.51812440
# cyl         -0.11144048  1.04502336 -0.1066392 0.91608738
# disp         0.01333524  0.01785750  0.7467585 0.46348865
# hp          -0.02148212  0.02176858 -0.9868407 0.33495531
# drat         0.78711097  1.63537307  0.4813036 0.63527790
# wt          -3.71530393  1.89441430 -1.9611887 0.06325215
# qsec         0.82104075  0.73084480  1.1234133 0.27394127
# vs           0.31776281  2.10450861  0.1509915 0.88142347
# am           2.52022689  2.05665055  1.2254035 0.23398971
# gear         0.65541302  1.49325996  0.4389142 0.66520643
# carb        -0.19941925  0.82875250 -0.2406258 0.81217871
# ## drop: cyl = 0.916
# mpg ~ disp + hp + drat + wt + qsec + vs + am + gear + carb
#                Estimate  Std. Error    t value   Pr(>|t|)
# (Intercept) 10.96007405 13.53030251  0.8100391 0.42659327
# disp         0.01282839  0.01682215  0.7625891 0.45380797
# hp          -0.02190885  0.02091131 -1.0477031 0.30615002
# drat         0.83519652  1.53625251  0.5436584 0.59214373
# wt          -3.69250814  1.83953550 -2.0073046 0.05715727
# qsec         0.84244138  0.68678068  1.2266527 0.23291993
# vs           0.38974986  1.94800204  0.2000767 0.84325850
# am           2.57742789  1.94034563  1.3283344 0.19768373
# gear         0.71155439  1.36561933  0.5210489 0.60753821
# carb        -0.21958316  0.78855537 -0.2784626 0.78325783
# ## drop: vs = 0.843
# mpg ~ disp + hp + drat + wt + qsec + am + gear + carb
#                Estimate  Std. Error    t value   Pr(>|t|)
# (Intercept)  9.76827789 11.89230469  0.8213949 0.41985460
# disp         0.01214441  0.01612373  0.7532010 0.45897019
# hp          -0.02095020  0.01992567 -1.0514175 0.30398892
# drat         0.87509822  1.49112525  0.5868710 0.56300717
# wt          -3.71151106  1.79833544 -2.0638592 0.05049085
# qsec         0.91082822  0.58311935  1.5619928 0.13194532
# am           2.52390094  1.88128007  1.3415870 0.19282690
# gear         0.75984464  1.31577205  0.5774896 0.56921947
# carb        -0.24796312  0.75933250 -0.3265541 0.74695821
# ## drop: carb = 0.747
# mpg ~ disp + hp + drat + wt + qsec + am + gear
#                Estimate  Std. Error    t value    Pr(>|t|)
# (Intercept)  9.19762837 11.54220381  0.7968693 0.433339841
# disp         0.01551976  0.01214235  1.2781513 0.213420001
# hp          -0.02470716  0.01596302 -1.5477746 0.134763097
# drat         0.81022794  1.45006779  0.5587518 0.581507634
# wt          -4.13065054  1.23592980 -3.3421401 0.002717119
# qsec         1.00978651  0.48883274  2.0657097 0.049814778
# am           2.58979984  1.83528342  1.4111171 0.171042438
# gear         0.60644020  1.20596266  0.5028681 0.619640616
# ## drop: gear = 0.62
# mpg ~ disp + hp + drat + wt + qsec + am
#                Estimate  Std. Error    t value    Pr(>|t|)
# (Intercept) 10.71061639 10.97539399  0.9758753 0.338475309
# disp         0.01310313  0.01098299  1.1930387 0.244054196
# hp          -0.02179818  0.01465399 -1.4875257 0.149381426
# drat         1.02065283  1.36747598  0.7463772 0.462401185
# wt          -4.04454214  1.20558182 -3.3548467 0.002536163
# qsec         0.99072948  0.48002393  2.0639168 0.049550895
# am           2.98468801  1.63382423  1.8268110 0.079692318
# ## drop: drat = 0.462
# mpg ~ disp + hp + wt + qsec + am
#                Estimate Std. Error   t value    Pr(>|t|)
# (Intercept) 14.36190396 9.74079485  1.474408 0.152378367
# disp         0.01123765 0.01060333  1.059823 0.298972150
# hp          -0.02117055 0.01450469 -1.459565 0.156387279
# wt          -4.08433206 1.19409972 -3.420428 0.002075008
# qsec         1.00689683 0.47543287  2.117853 0.043907652
# am           3.47045340 1.48578009  2.335779 0.027487809
# ## drop: disp = 0.299
# mpg ~ hp + wt + qsec + am
#                Estimate Std. Error   t value    Pr(>|t|)
# (Intercept) 17.44019110  9.3188688  1.871492 0.072149342
# hp          -0.01764654  0.0141506 -1.247052 0.223087932
# wt          -3.23809682  0.8898986 -3.638726 0.001141407
# qsec         0.81060254  0.4388703  1.847021 0.075731202
# am           2.92550394  1.3971471  2.093913 0.045790788
# ## drop: hp = 0.223
# mpg ~ wt + qsec + am
#              Estimate Std. Error   t value     Pr(>|t|)
# (Intercept)  9.617781  6.9595930  1.381946 1.779152e-01
# wt          -3.916504  0.7112016 -5.506882 6.952711e-06
# qsec         1.225886  0.2886696  4.246676 2.161737e-04
# am           2.935837  1.4109045  2.080819 4.671551e-02

我应该注意到,在这个模型中,disp(排量)对确定燃油效率(mpg)没有影响,这似乎是违反直觉的。我还没有在这个数据集上深入研究过这个问题,但人们应该小心使用这个“总是降低最高p值”,而不是总是绝对肯定地接受它的结果。

相关问题