我根据另外三列Y1, Y2, Y3
输入三列X1, X2, X3
。我更喜欢自定义插补,而不是从mice
到pmm
,因为我需要保留以下规则:
X1
和X2
总是小于X3
。
1.如果X2
存在,但X1
和X3
不存在,则X2
应表示X1
和X3
的平均值
1.如果X2
不存在,那么它应该是X1
和X3
的平均值(插补)X3
总是大于X1
和X2
但是,当我运行mice
时,我得到以下错误:
Error: If no blocks are specified, predictorMatrix must have same number of rows and columns
我尝试了以下方法。
data_matrix <- matrix(c(
"log_Var1", "log_Var2", "log_Var3", "log_Var4", "log_Var5", "log_Var6",
1, 4.17709744, 7.20919283, 7.8779374, 7.603399, -0.4155154, 11.644787,
NA, NA, NA, 7.906915, -0.4094731, 11.736925,
2.32717038, 2.87198430, 3.2226766, 6.249975, -0.4877604, 9.890858,
NA, NA, NA, 8.419139, -0.6616485, 10.832142,
1.88382196, 5.03128887, 5.7027216, 7.083388, -0.4748152, 10.769201,
NA, NA, NA, 7.538495, -0.4034671, 11.530854,
5.53207044, 6.41177122, 6.8724143, 7.377134, -0.5074978, 11.823149,
NA, NA, NA, 6.655440, -0.5727010, 10.649109,
NA, NA, NA, 7.226936, -0.5276327, 11.324642,
NA, NA, NA, 7.945555, -0.6655320, 11.666239,
NA, NA, NA, 7.335634, -0.3989861, 11.617934,
NA, NA, NA, 7.137278, -0.5276327, 11.196226,
NA, NA, NA, 7.340187, -0.4185503, 10.607278,
NA, NA, NA, 6.898715, -0.5464528, 10.506601,
NA, NA, NA, 6.432940, -0.2757535, 11.886294,
NA, 5.30275053, NA, 8.072779, -0.4620355, 11.577551
), nrow = 16, byrow = TRUE)
colnames(data_matrix) <- c("log_Var1", "log_Var2", "log_Var3", "log_Var4", "log_Var5", "log_Var6")
df_a <- as.data.frame(data_matrix)
所以,我有一个自定义函数:
custom_impute <- function(data, predictorMatrix, ...) {
# Select relevant columns
X1 <- data$X1
X2 <- data$X2
X3 <- data$X3
Y1 <- data$Y1
Y2 <- data$Y2
Y3 <- data$Y3
# Impute X1 based on Y1, Y2, and Y3
X1[is.na(X1)] <- Y1 + Y2 + Y3
# Impute X3 based on Y1, Y2, and Y3
X3[is.na(X3)] <- Y1 + Y2 + Y3
# Calculate the average of X1 and X3
avg_X1_X3 <- (X1 + X3) / 2
# Impute X2 based on the average of X1 and X3 if X2 is missing
X2[is.na(X2) & !is.na(avg_X1_X3)] <- avg_X1_X3[is.na(X2) & !is.na(avg_X1_X3)]
# Update the imputed values in the dataset
data$X1 <- X1
data$X2 <- X2
data$X3 <- X3
return(data)
}
其中predictorMatrix
的组装方式如下:
predictor_matrix <- as.matrix(!is.na(df_a[, c("log_Var1", "log_Var2", "log_Var3", "log_Var4", "log_Var5", "log_Var6")]))
predictor_matrix[, c("log_Var1", "log_Var2", "log_Var3")] <- FALSE
关于n_imputations <- 1000
和相应的插补,如:
imputed_data_A <- mice(df_a, m = n_imputations, predictorMatrix = predictor_matrix, method = custom_impute)
1条答案
按热度按时间3htmauhk1#
解决方案的第一部分确实是(正如@jay.sf所指出的)从数据矩阵中删除列名,以及第一个
1
。然后,对于您的填补方法,
mice
期望填补函数具有特定格式,例如。用于平均插补。由于你的插补模型是多变量的,我建议模仿多变量插补方法
mpmm
的结构。它的代码用R
打印,或浏览the
mice
GitHub repo的源代码。