R语言 如何用emmeans输出多项式回归系数?

w8ntj3qf  于 11个月前  发布在  其他
关注(0)|答案(1)|浏览(104)
ml <- foreign::read.dta("https://stats.idre.ucla.edu/stat/data/hsbdemo.dta")
model.fit <- multinom(ses~female*prog+read, data=ml)

字符串
我有以下系数:

Call:
multinom(formula = ses ~ female * prog + read, data = ml)

Coefficients:
       (Intercept) femalefemale progacademic progvocation       read femalefemale:progacademic
middle   -1.919435  -0.01626812     1.271990     1.301494 0.04404166                 -1.334239
high     -4.701536   0.44126239     2.092619     1.171212 0.07779108                 -1.815047
       femalefemale:progvocation
middle                -0.6859539
high                  -1.5431251

Residual Deviance: 386.2441 
AIC: 414.2441


优势比exp(1.271990)=3.567946exp(2.092619)=8.106117
但是,使用emmeans包,

EMM <- emmeans(model.fit, revpairwise~ses|female*prog)
REMM <- regrid(EMM, transform="logit")
contrast(REMM, interaction="revpairwise", type="response", by="female")


我只能得到以下输出

female = male:
 ses_revpairwise prog_revpairwise    odds.ratio     SE df null t.ratio p.value
 middle / low    academic / general       4.926  5.380 14    1   1.460  0.1663
 high / low      academic / general      15.314 16.723 14    1   2.499  0.0255
 high / middle   academic / general       3.109  3.174 14    1   1.111  0.2853
 middle / low    vocation / general       7.385  9.044 14    1   1.633  0.1248
 high / low      vocation / general       4.430  5.496 14    1   1.200  0.2501
 high / middle   vocation / general       0.600  0.780 14    1  -0.393  0.7002
 middle / low    vocation / academic      1.499  1.699 14    1   0.357  0.7262
 high / low      vocation / academic      0.289  0.326 14    1  -1.099  0.2903
 high / middle   vocation / academic      0.193  0.222 14    1  -1.431  0.1745

female = female:
 ses_revpairwise prog_revpairwise    odds.ratio     SE df null t.ratio p.value
 middle / low    academic / general       0.885  0.826 14    1  -0.131  0.8977
 high / low      academic / general       1.475  1.429 14    1   0.401  0.6943
 high / middle   academic / general       1.667  1.645 14    1   0.517  0.6130
 middle / low    vocation / general       3.075  3.245 14    1   1.065  0.3051
 high / low      vocation / general       0.656  0.750 14    1  -0.369  0.7177
 high / middle   vocation / general       0.213  0.262 14    1  -1.259  0.2287
 middle / low    vocation / academic      3.475  3.327 14    1   1.301  0.2144
 high / low      vocation / academic      0.445  0.446 14    1  -0.808  0.4324
 high / middle   vocation / academic      0.128  0.135 14    1  -1.947  0.0719

Tests are performed on the log odds ratio scale


OR不同:4.926 vs 3.567946和15.314 vs 8.106117。
这是怎么回事?
与多项式系数的exp相同的输出

vuktfyat

vuktfyat1#

多项模型的系数不是比值比,它们是对数标度,而不是logit标度。
我将演示如何使用模型预测进行手动计算。首先,使用因子组合创建一个网格。我们还将read设置为其均值,即52.23。

G = expand.grid(female = c("male", "female"), 
                prog = c("general", "academic", "vocation"), 
                read = 52.23)

字符串
现在,让我们获得这个网格上的预测概率:

( probs = predict(model.fit, newdata = G, type = "probs") )
##          low    middle      high
## 1 0.33426672 0.4892138 0.1765194
## 2 0.30666233 0.4415713 0.2517663
## 3 0.09521526 0.4971990 0.4075858
## 4 0.29097972 0.3937028 0.3153175
## 5 0.12373582 0.6654763 0.2107879
## 6 0.23636138 0.6298578 0.1337808


从这些,我们将这些概率转换为赔率:

( odds = probs / (1 - probs) )
##         low    middle      high
## 1 0.5021031 0.9577664 0.2143578
## 2 0.4422987 0.7907390 0.3364809
## 3 0.1052353 0.9888583 0.6880081
## 4 0.4103969 0.6493561 0.4605310
## 5 0.1412084 1.9893246 0.2670865
## 6 0.3095199 1.7016644 0.1544422


最后,我们获得这些比值的比值,并将合并与G结合起来,以显示相关的因子水平:

( OR = cbind(G, rat21 = odds[,2]/odds[,1], rat31 = odds[,3]/odds[,1]) )
##   female     prog  read     rat21     rat31
## 1   male  general 52.23  1.907510 0.4269199
## 2 female  general 52.23  1.787794 0.7607550
## 3   male academic 52.23  9.396644 6.5378090
## 4 female academic 52.23  1.582264 1.1221600
## 5   male vocation 52.23 14.087868 1.8914355
## 6 female vocation 52.23  5.497754 0.4989734


上述计算都没有使用emmeans包;只是模型预测。所以现在我们验证使用emmeans()获得了相同的结果。首先,为相同的因子组合创建emmGrid对象:

emm = emmeans(model.fit, ~ ses | female * prog)

然后,在重新网格化到logit量表后,获得这些估计值的Dunnett对比:

contrast(regrid(emm, "logit"), "dunnett", type = "resp")
## female = male, prog = general:
##  contrast     odds.ratio     SE df null t.ratio p.value
##  middle / low      1.908  1.617 14    1   0.762  0.6702
##  high / low        0.427  0.367 14    1  -0.990  0.5294
## 
## female = female, prog = general:
##  contrast     odds.ratio     SE df null t.ratio p.value
##  middle / low      1.788  1.371 14    1   0.758  0.6726
##  high / low        0.761  0.611 14    1  -0.340  0.9075
## 
## female = male, prog = academic:
##  contrast     odds.ratio     SE df null t.ratio p.value
##  middle / low      9.397  6.486 14    1   3.246  0.0112
##  high / low        6.538  4.473 14    1   2.744  0.0299
## 
## female = female, prog = academic:
##  contrast     odds.ratio     SE df null t.ratio p.value
##  middle / low      1.582  0.816 14    1   0.890  0.5902
##  high / low        1.122  0.579 14    1   0.223  0.9542
## 
## female = male, prog = vocation:
##  contrast     odds.ratio     SE df null t.ratio p.value
##  middle / low     14.088 12.602 14    1   2.957  0.0198
##  high / low        1.891  1.688 14    1   0.714  0.6996
## 
## female = female, prog = vocation:
##  contrast     odds.ratio     SE df null t.ratio p.value
##  middle / low      5.498  4.363 14    1   2.148  0.0914
##  high / low        0.499  0.423 14    1  -0.820  0.6339
## 
## P value adjustment: dunnettx method for 2 tests 
## Tests are performed on the log odds ratio scale

创建于2023-12-30使用reprex v2.0.2
这些结果与之前得到的OR矩阵相同,验证了emmeans()的正确性。请注意,这些比值比都不等于3.567946或8.106117。这是因为回归系数不是比值比的对数。
我将演示其中一种情况来说明它是如何工作的。考虑第一种情况,其中female = "male"program = "general"read = 52.53。在这里,只有read的截距和系数是相关的,因为其他所有变量的虚拟变量都是零。我们有两行系数,分别为middlehigh; low的线性预测值都被认为是零。因此,我们有以下多项式响应的线性预测值:

coef(model.fit)
##        (Intercept) femalefemale progacademic progvocation       read ...
## middle   -1.919435  -0.01626812     1.271990     1.301494 0.04404166 ...
## high     -4.701536   0.44126239     2.092619     1.171212 0.07779108 ...

( lp = c(lo = 0, 
         mid = -1.919435 + 0.04404166 * 52.23, 
         hi = -4.701536 + 0.07779108 * 52.53) )
##         lo        mid         hi 
##  0.0000000  0.3808609 -0.6151706

如前所述,线性预测是对数尺度的。为了对它们进行反变换,我们对它们取幂,然后除以总和,使它们的总和为1:

exp(lp) / sum(exp(lp))
##        lo       mid        hi 
## 0.3328792 0.4871834 0.1799374

请注意,这些是相同的(四舍五入时有轻微的小问题)作为我们从predict()获得的prob的第一行。考虑到这是如何工作的,如果只有两个水平的多项式响应,那么线性预测变量的形式为c(0, b),第二层的概率是exp(b) / (1 + exp(b)),所以b确实是logit,但这只对两个多项式类成立。
此外,还有一个评论说,不清楚OP为什么选择progacademic的系数而不是截距,也不清楚为什么选择交互作用对比。

相关问题