R语言 在“小鼠”中,预测矩阵和自定义插补失败

qaxu7uf2  于 2023-10-13  发布在  其他
关注(0)|答案(1)|浏览(107)

我根据另外三列Y1, Y2, Y3输入三列X1, X2, X3。我更喜欢自定义插补,而不是从micepmm,因为我需要保留以下规则:

  1. X1X2总是小于X3
    1.如果X2存在,但X1X3不存在,则X2应表示X1X3的平均值
    1.如果X2不存在,那么它应该是X1X3的平均值(插补)
  2. X3总是大于X1X2
    但是,当我运行mice时,我得到以下错误:
Error: If no blocks are specified, predictorMatrix must have same number of rows and columns

我尝试了以下方法。

data_matrix <- matrix(c(
  "log_Var1", "log_Var2", "log_Var3", "log_Var4", "log_Var5", "log_Var6",
  1, 4.17709744, 7.20919283, 7.8779374, 7.603399, -0.4155154, 11.644787,
  NA, NA, NA, 7.906915, -0.4094731, 11.736925,
  2.32717038, 2.87198430, 3.2226766, 6.249975, -0.4877604, 9.890858,
  NA, NA, NA, 8.419139, -0.6616485, 10.832142,
  1.88382196, 5.03128887, 5.7027216, 7.083388, -0.4748152, 10.769201,
  NA, NA, NA, 7.538495, -0.4034671, 11.530854,
  5.53207044, 6.41177122, 6.8724143, 7.377134, -0.5074978, 11.823149,
  NA, NA, NA, 6.655440, -0.5727010, 10.649109,
  NA, NA, NA, 7.226936, -0.5276327, 11.324642,
  NA, NA, NA, 7.945555, -0.6655320, 11.666239,
  NA, NA, NA, 7.335634, -0.3989861, 11.617934,
  NA, NA, NA, 7.137278, -0.5276327, 11.196226,
  NA, NA, NA, 7.340187, -0.4185503, 10.607278,
  NA, NA, NA, 6.898715, -0.5464528, 10.506601,
  NA, NA, NA, 6.432940, -0.2757535, 11.886294,
  NA, 5.30275053, NA, 8.072779, -0.4620355, 11.577551
), nrow = 16, byrow = TRUE)

colnames(data_matrix) <- c("log_Var1", "log_Var2", "log_Var3", "log_Var4", "log_Var5", "log_Var6")

df_a <- as.data.frame(data_matrix)

所以,我有一个自定义函数:

custom_impute <- function(data, predictorMatrix, ...) {
  
  # Select relevant columns
  X1 <- data$X1
  X2 <- data$X2
  X3 <- data$X3
  Y1 <- data$Y1
  Y2 <- data$Y2
  Y3 <- data$Y3
  
  # Impute X1 based on Y1, Y2, and Y3
  X1[is.na(X1)] <- Y1 + Y2 + Y3
  
  # Impute X3 based on Y1, Y2, and Y3
  X3[is.na(X3)] <- Y1 + Y2 + Y3
  
  # Calculate the average of X1 and X3
  avg_X1_X3 <- (X1 + X3) / 2
  
  # Impute X2 based on the average of X1 and X3 if X2 is missing
  X2[is.na(X2) & !is.na(avg_X1_X3)] <- avg_X1_X3[is.na(X2) & !is.na(avg_X1_X3)]
  
  # Update the imputed values in the dataset
  data$X1 <- X1
  data$X2 <- X2
  data$X3 <- X3
  
  return(data)
}

其中predictorMatrix的组装方式如下:

predictor_matrix <- as.matrix(!is.na(df_a[, c("log_Var1", "log_Var2", "log_Var3", "log_Var4", "log_Var5", "log_Var6")]))

predictor_matrix[, c("log_Var1", "log_Var2", "log_Var3")] <- FALSE

关于n_imputations <- 1000
和相应的插补,如:

imputed_data_A <- mice(df_a, m = n_imputations, predictorMatrix = predictor_matrix, method = custom_impute)
3htmauhk

3htmauhk1#

解决方案的第一部分确实是(正如@jay.sf所指出的)从数据矩阵中删除列名,以及第一个1

data_matrix <- matrix(c(
  #"log_Var1", "log_Var2", "log_Var3", "log_Var4", "log_Var5", "log_Var6",
  #1,
  4.17709744, 7.20919283, 7.8779374, 7.603399, -0.4155154, 11.644787,
  NA, NA, NA, 7.906915, -0.4094731, 11.736925,
  2.32717038, 2.87198430, 3.2226766, 6.249975, -0.4877604, 9.890858,
  NA, NA, NA, 8.419139, -0.6616485, 10.832142,
  1.88382196, 5.03128887, 5.7027216, 7.083388, -0.4748152, 10.769201,
  NA, NA, NA, 7.538495, -0.4034671, 11.530854,
  5.53207044, 6.41177122, 6.8724143, 7.377134, -0.5074978, 11.823149,
  NA, NA, NA, 6.655440, -0.5727010, 10.649109,
  NA, NA, NA, 7.226936, -0.5276327, 11.324642,
  NA, NA, NA, 7.945555, -0.6655320, 11.666239,
  NA, NA, NA, 7.335634, -0.3989861, 11.617934,
  NA, NA, NA, 7.137278, -0.5276327, 11.196226,
  NA, NA, NA, 7.340187, -0.4185503, 10.607278,
  NA, NA, NA, 6.898715, -0.5464528, 10.506601,
  NA, NA, NA, 6.432940, -0.2757535, 11.886294,
  NA, 5.30275053, NA, 8.072779, -0.4620355, 11.577551
), nrow = 16, byrow = TRUE)

colnames(data_matrix) <- c("log_Var1", "log_Var2", "log_Var3", "log_Var4", "log_Var5", "log_Var6")

df_a <- as.data.frame(data_matrix)

然后,对于您的填补方法,mice期望填补函数具有特定格式,例如。

mice.impute.mean <- function(y, ry, x = NULL, wy = NULL, ...) {...}

用于平均插补。由于你的插补模型是多变量的,我建议模仿多变量插补方法mpmm的结构。它的代码用R打印,

mice.impute.mpmm

或浏览the mice GitHub repo的源代码。

相关问题