在R-零膨胀Poisson中使用MICE合并多个插补数据集估计值的问题

fdx2calv  于 2023-03-15  发布在  其他
关注(0)|答案(1)|浏览(199)

我尝试在使用mice()插补缺失数据的 Dataframe 上运行零膨胀泊松回归。我的代码成功运行了多重插补并合并了结果。但是,当我尝试汇总合并估计值时,我无法获得模型的完整结果。零膨胀泊松模型(zeroinfl())有两个组件:一个用于计数部分,一个用于我的数据中过多的零。我只能显示合并模型的一部分。

library(dplyr)
library(mice)
library(pscl)
library(poissonreg)
library(countimp)

# Set the seed for reproducibility
set.seed(123)

# Simulate data with one count outcome and three variables
n <- 1000
x1 <- rnorm(n)
x2 <- rbinom(n, 1, 0.5)
x3 <- rpois(n, 2)
y <- rpois(n, 1 + exp(0.5 * x1 + 0.8 * x2 + 0.3 * x3))

# Introduce missing data to the three variables
prop_missing <- 0.2
missing_x1 <- sample(c(TRUE, FALSE), size = n, 
               prob = c(prop_missing, 1 - prop_missing), replace = TRUE)
missing_x2 <- sample(c(TRUE, FALSE), size = n, 
               prob = c(prop_missing, 1 - prop_missing), replace = TRUE)
missing_x3 <- sample(c(TRUE, FALSE), size = n, 
               prob = c(prop_missing, 1 - prop_missing), replace = TRUE)
x1[missing_x1] <- NA
x2[missing_x2] <- NA
x3[missing_x3] <- NA

# Create a data frame with the simulated data
dat <- data.frame(y, x1, x2, x3)

#run intital imputation
ini <- mice( dat, m = 5, maxit = 0)
pred <- ini$predictorMatrix #set predictive matrix
pred[1, ] <- c(0, 2, 2, 3) #edit predictive matrix

imp.zip <- mice(dat, m = 5, maxit = 5, method = c("", "pmm", "pmm", "zip"), 
                pred , seed = 1234, print = T) 
  # run imputation with pred and specify methods

res.zinb <- with(imp.zip, zeroinfl( y ~ x1 + x2 | x3, dist = "poisson", 
                 link = "logit" ) )  
  # run the zeroinflated poisson regression on the imputed data
summary(pool(res.zinb)) #summarize and pool
zbq4xfa0

zbq4xfa01#

问题

所以我认为这个问题与mice::pool()是如何实现的有关。据我所知,它做了以下事情:
1.调用名为pool.fitlist的内部函数。(github源代码)

  1. pool.fitlist接收mira类的对象,并在其上调用summary。(github源代码)
  2. pool.fitlist计算合并估计值。(github源代码)
    然后当你调用summary(pool(res.zinb))时,它会调用summary.mipo(github源代码),因为pool(res.zinb)mipo类。
    步骤(2)中调用的summary函数不知道如何显示zeroinfl模型的所有组件,这就是summary(pool(...))不显示模型的logit部分的原因
    broombroom.mixed也没有实现zeroinfl模型的简洁摘要-您可以通过加载broom.mixed包并运行broom.mixed::get_methods()来检查。

我的解决方案:说明

poissonreg::tidy()解决了我们步骤(2)中的问题:

> fitlist <- mice::getfit(res.zinb)
> poissonreg::tidy(fitlist[[1]], type="all")
# A tibble: 5 × 6
  term        type  estimate std.error statistic   p.value
  <chr>       <chr>    <dbl>     <dbl>     <dbl>     <dbl>
1 (Intercept) count    1.18     0.0256     46.2  0        
2 x1          count    0.393    0.0152     25.9  2.36e-148
3 x2          count    0.541    0.0309     17.5  7.77e- 69
4 (Intercept) zero    -2.95     0.505      -5.83 5.56e-  9
5 x3          zero    -1.04     0.482      -2.15 3.13e-  2

我提出的解决方案基本上是通过以下步骤手动执行pool()
1.为上述fitlist I中的每个zeroinfl型号收集整齐的tibble
1.对于每个(term,type)组,调用mice::pool.scalar来计算合并估计值,这与pool.fitlist中的操作相同,但我认为该方法是为这个特定用例提供的(参见pool Rdocumentation
1.使用pool.scalar的结果,根据summary.mipo的计算方式计算合并估计值、标准误、统计量和p值。

我的解决方案:执行情况

以下是上述3个步骤的完整实施:

# Step 1
fitlist <- mice::getfit(res.zinb)
tidylist <- lapply(fitlist, function(fit) poissonreg::tidy(fit, type = "all"))
w <- bind_rows(tidylist)

# Step 2
# Convenience wrapper function around pool.scalar.
# pool.scalar also returns a "qhat" and "u" which are vectors, 
# and we don't need them. Those vectors mess up the format of
# the summary that we want to compute later.
wrap.pool.scalar <- function(estimates, variances, n, k) {
  pool_res <- pool.scalar(estimates, variances, n = n, k = k)

  return(as_tibble(list(
    qbar = pool_res$qbar, 
    ubar = pool_res$ubar, 
    b = pool_res$b, 
    t = pool_res$t, 
    df = pool_res$df, 
    r = pool_res$r, 
    fmi = pool_res$fmi)))
}

# For each (term,type) pair, compute pooled univariate estimates using 
# wrap.pool.scalar 
pooled <- w %>% group_by(term, type) %>% 
  # n is hard-coded here but you should probably replace it with 
  # your n from above.
  reframe(wrap.pool.scalar(estimate, std.error^2, n=1000, k=1)) %>% 
  mutate(estimate = qbar)
pooled

# Step 3
# Copy the pooled estimate calculations from
# https://github.com/amices/mice/blob/master/R/mipo.R#L69-L71
pooled_summary <- pooled %>% mutate(
  std.error = sqrt(t), 
  statistic = estimate / std.error,
  p.value = 2 * (pt(abs(statistic), pmax(df, 0.001), lower.tail = FALSE))) %>% 
  dplyr::select(term, type, estimate, std.error, statistic, df, p.value)
pooled_summary

健全性检查

我们可以检查summary(pool(res.zinb))提供的估计值与pooled_summary具有相同的值

> summary(pool(res.zinb))
         term  estimate  std.error statistic        df      p.value
1 (Intercept) 1.1867116 0.02927839  40.53200  68.26755 1.723856e-49
2          x1 0.3844206 0.01860580  20.66134  33.92570 8.515891e-21
3          x2 0.5229170 0.03402354  15.36927 126.41155 1.032634e-30
Warning message:
In get.dfcom(object, dfcom) : Infinite sample size assumed.
> pooled_summary
# A tibble: 5 × 7
  term        type  estimate std.error statistic    df  p.value
  <chr>       <chr>    <dbl>     <dbl>     <dbl> <dbl>    <dbl>
1 (Intercept) count    1.19     0.0293     40.5   62.6 1.20e-46
2 (Intercept) zero    -2.79     0.464      -6.02 525.  3.25e- 9
3 x1          count    0.384    0.0186     20.7   32.3 3.62e-20
4 x2          count    0.523    0.0340     15.4  110.  4.20e-29
5 x3          zero    -1.07     0.433      -2.47 594.  1.38e- 2

相关问题