R语言 拟合多个零膨胀负宾模型并得到合并结果

xvw2m8pv  于 2023-02-26  发布在  其他
关注(0)|答案(1)|浏览(173)

我有一个包含患者数据的数据集。一些患者在医院中多次住院,但观察结果是独立的(我尝试了一个多水平模型,但对这些数据来说是不可能的)我创建了一个数据集,对于每个有多次住院的病人,只随机选择一次住院。我为这些患者创建了100个随机住院数据集。我的因变量是计数变量,零膨胀负二项模型拟合最佳。我已经设法对每个数据集运行了回归模型(数据集由变量“sample”标识),但我不知道如何获得所有这100个回归的合并结果,我希望获得每个预测变量的计数模型和零膨胀模型的合并结果,我使用:library(dplyr); library(tidyverse); library(pscl); library(broom); library(jtools); library(mice)
pool函数来自mice
我创建的组合数据集如下所示:

set.seed(12345)

Combined_randcase <- bind_rows(replicate(100, cohort_1 %>% group_by(patient) %>%
                                  slice_sample(n=1, replace = TRUE), simplify=F), .id="sample")

Combined_randcase <- data.frame(as.list(Combined_randcase))

我在每个数据集上运行ZINB回归模型,按“样本”分组,如下所示(使用broom包):

regr_comb_randcase.zeroinfl = Combined_randcase %>% 
nest_by(sample) %>% 
mutate(model = list(zeroinfl(formula = cm_number ~ after_wm + age + gender_male + ref_mode_police + ref_lg_invol + ref_reas_selfharm + ref_reas_aggrpers + comm_limited + duration_days + diagnosis_personality + diagnosis_psychosis + diagnosis_mania + diagnosis_substance + intoxication | age + gender_male + ref_mode_police + ref_lg_invol + ref_reas_selfharm + ref_reas_aggrpers + comm_limited + duration_days + diagnosis_personality + diagnosis_psychosis + diagnosis_mania + diagnosis_substance + intoxication, data = cohort_1, na.action = na.exclude, dist = "negbin"))) 
%>%
  summarise(tidy(model))

这就是我试图获得汇总结果的方法:

models.zeroinfl <- regr_comb_randcase.zeroinfl$model

pool_results.zeroinfl <- pool(regr_comb_randcase.zeroinfl)

当运行第二行时,我得到这个错误:

Error: No tidy method for objects of class character

对于另一个逻辑回归模型,我成功地做到了这一点:

regr_comb_randcase.log = Combined_randcase %>% 
group_by(sample) %>% 
do(model = glm(cm ~ after_wm + age + gender_male +ref_mode_police + ref_lg_invol + ref_reas_selfharm + ref_reas_aggrpers + comm_limited + duration_days + diagnosis_personality + diagnosis_psychosis + diagnosis_mania + diagnosis_substance + intoxication, data = ., family = binomial()))

models <- regr_comb_randcase.log$model

pool_results <- pool(models)

summary(pool_results)

dput(cohort_1_example)的输出(我的数据集的缩短版本)用于再现性:

structure(list(case = c("20001879", "20009253", "20003748", "20002321", 
"20001662", "1910967", "20008058", "20010686", "20010938", "20009508", 
"20002307", "20010105", "210098181", "21009818", "210100261", 
"21010026", "21000865", "21002199", "1906803", "1907642", "20008274", 
"21000858", "21004557", "1910669", "21004451", "21000202", "21000812", 
"21001006", "21001143", "21001423", "1906820", "21000448", "21002128", 
"21002666", "21003560", "1907070", "20011121", "1907614", "20002748", 
"20010645", "21001363", "1908906", "1910981", "1905926", "21002429", 
"21004264", "20011209", "20010442", "20009977", "1906382", "1909409", 
"1908904", "1910516", "20001534", "20011201", "1907432", "1908332", 
"1906356", "20011026", "20008206", "20000809", "1910664", "1908673", 
"1907610", "1911046", "20008505", "20009385", "21000530", "1909620", 
"1909730", "1910988", "20009899", "1907282", "1906671", "20007870", 
"1910749", "20010782", "20009808", "20003311", "1910722", "1910529", 
"1906638", "1906861", "1906956", "1910743", "20002057", "21000891", 
"20010349", "20008503", "1906093", "1910662", "20008093", "20010683", 
"20008787", "20003631", "20007796", "20008089", "21004141", "20010177", 
"20001316", "1909809", "20001875", "20009552", "20001443", "21000419", 
"20003106", "1909773", "21004600", "20008105", "21002070", "1908245", 
"1909860", "21004209", "21003022", "20003151", "20011037", "21001966", 
"20009902", "1906202", "1910009", "1910777", "20010294", "1910072"
), patient = c("10", "11", "100", "100", "101", "102", "103", 
"105", "106", "107", "108", "11", "11", "11", "11", "11", "1000", 
"1001", "1002", "1002", "1003", "1003", "1004", "1005", "1005", 
"1006", "1008", "1009", "1009", "1009", "1011", "1012", "1013", 
"1013", "1013", "1014", "1016", "1017", "1018", "1020", "1020", 
"1021", "1022", "1023", "1026", "1026", "1029", "1030", "1033", 
"1035", "1036", "1037", "1037", "1037", "1039", "1041", "1041", 
"1042", "1042", "1043", "1044", "1045", "1046", "1047", "1048", 
"1049", "1050", "1053", "1054", "1056", "1056", "1057", "1058", 
"1060", "1061", "1064", "1064", "1064", "1065", "1066", "1067", 
"1067", "1067", "1067", "1069", "1071", "1072", "1073", "1074", 
"1075", "1075", "1076", "1077", "1078", "1079", "1080", "1081", 
"1082", "1083", "1086", "1087", "1087", "1088", "1089", "1089", 
"1090", "1091", "1091", "1092", "1093", "1094", "1094", "1095", 
"1096", "1098", "1098", "1098", "1099", "1048", "1048", "1021", 
"1018", "1011"), cm = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 
0, 0, 0, 0, 0), cm_number = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 3, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 1, 5, 0, 0, 0, 0, 0, 0, 1, 
0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 2, 2, 0, 0, 1, 0, 0, 0, 0, 1, 
0, 0, 0, 0, 0, 0, 0), total_cm_duration = c(0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 165.000000003492, 0, 0, 0, 0, 0, 0, 
0, 0, 174.999999994179, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 259.999999998836, 
720, 0, 0, 0, 0, 0, 60.0000000069849, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 815.000000005821, 0, 0, 10865.0000000023, 
0, 0, 0, 0, 0, 0, 420.000000006985, 0, 0, 0, 0, 0, 200.000000002328, 
0, 0, 0, 0, 0, 0, 239.999999996508, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 1084.99999999534, 0, 0, 145.000000001164, 0, 0, 789.999999997672, 
435.000000003492, 0, 0, 60.0000000069849, 0, 0, 0, 0, 775.000000001164, 
0, 0, 0, 0, 0, 0, 0), after_wm = c(0, 1, 0, 0, 0, 0, 1, 1, 1, 
1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 
0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 
0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 
0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 
1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 
0, 1, 1, 1, 0, 0, 0, 1, 0), age = c(26, 27, 53, 53, 26, 28, 30, 
57, 39, 50, 49, 27, 28, 28, 27, 27, 89, 18, 22, 22, 21, 21, 58, 
35, 36, 63, 44, 35, 35, 35, 25, 24, 36, 36, 36, 62, 50, 21, 55, 
23, 23, 44, 53, 71, 39, 39, 79, 47, 81, 43, 39, 21, 22, 22, 79, 
22, 22, 33, 35, 86, 27, 42, 20, 30, 25, 22, 26, 62, 54, 46, 46, 
46, 79, 39, 21, 63, 64, 64, 31, 59, 70, 70, 70, 70, 49, 37, 49, 
63, 74, 38, 39, 74, 50, 72, 61, 80, 51, 45, 67, 45, 76, 76, 61, 
30, 31, 35, 48, 49, 45, 30, 76, 76, 20, 18, 20, 20, 21, 51, 24, 
24, 45, 55, 25), gender_male = c(1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 
1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 
1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 
0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 
0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 
1, 1, 1, 0, 0, 0, 0, 0), ref_mode_police = c(0, 0, 0, 0, 0, 0, 
1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 
1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 
0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 
0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0), ref_lg_invol = c(0, 0, 0, 
0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 
1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 
1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0), ref_reas_selfharm = c(1, 
1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 
1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 
1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 
0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 
1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 
0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0), ref_reas_aggrpers = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 
0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0), comm_limited = c(0, 
0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 
1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), duration_days = c(41, 
1, 2, 42, 3, 46, 3, 8, 1, 1, 21, 64, 25, 2, 25, 2, 22, 26, 7, 
29, 101, 119, 153, 2, 2, 51, 10, 2, 6, 49, 83, 5, 1, 8, 1, 36, 
71, 1, 7, 9, 166, 41, 2, 76, 12, 1, 25, 40, 4, 0, 2, 28, 1, 3, 
49, 29, 54, 95, 119, 29, 28, 26, 43, 1, 15, 121, 22, 28, 73, 
13, 39, 1, 119, 14, 73, 18, 124, 32, 2, 120, 67, 2, 2, 8, 29, 
27, 34, 32, 112, 6, 8, 38, 118, 24, 38, 20, 2, 1, 2, 9, 1, 21, 
42, 57, 49, 1, 1, 1, 35, 2, 45, 23, 64, 29, 2, 6, 56, 0, 5, 3, 
58, 51, 2), diagnosis_personality = c(0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 
1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0), diagnosis_psychosis = c(1, 1, 
1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 
0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 
1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 
1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 
1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 
1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1), diagnosis_mania = c(0, 
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 
0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), diagnosis_substance = c(0, 
0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 
1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 
0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 
0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1), intoxication = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 
0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 
1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1)), row.names = c(NA, 
-123L), class = c("tbl_df", "tbl", "data.frame"))

我可以找到用线性模型和逻辑模型来做这个的信息,但是没有用零膨胀的negbin模型来做,也许这就是为什么tidy不起作用的原因吧,任何帮助都是非常感谢的。

suzh9iv8

suzh9iv81#

我们可以拟合混合/多水平模型的数据与单例组;主要的约束是总体上是否有足够的信息使组间方差的估计可行。
在主流软件包中,lme4可以拟合LMM、逻辑GLMM和负二项混合模型(尽管速度有点慢,而且不是零膨胀混合模型,需要大量工作:glmmTMB可以处理上述所有内容。brms可以做glmmTMB可以做的任何事情,加上厨房Flume,但速度较慢(因为贝叶斯MCMC),可能需要您深入了解贝叶斯/MCMC采样的杂草。
下面的例子使用了你的数据子集,模型大大简化了。你的样本数据有86个病人,61个单例,123个总观察值。这似乎几乎足以拟合简化的模型(下图);我们在ZINB拟合(零膨胀项收敛于零概率,NB离散参数收敛于泊松分布)和逻辑拟合(奇异拟合,即患者间方差收敛于零)中确实遇到了一些典型的数据不足问题。如果您的完整数据集很大,这些问题发生的可能性要小得多......
机器的第一部分(除了加载包之外)是装饰性的,以避免太多重复的代码,并使它更容易看到所有模型中包含的公共预测器集。

library(lme4)
library(glmmTMB)
## list of all of the fixed effect predictors
fix_vars <- c("after_wm", "age", "gender_male", "ref_mode_police")

## 'resp' is the name of the response variable (character)
ff <- function(resp) {
   reformulate(c(fix_vars, "(1|patient)"), resp = resp)
}
## Gaussian/LMM
lme4::lmer(ff("total_cm_duration"), data = dd)

## NB/no zero-inflation
glmmTMB::glmmTMB(ff("cm_number"),
                 family = nbinom2,
                 data = dd)

## NB with (simple) zero-inflation
glmmTMB::glmmTMB(ff("cm_number"),
                 family = nbinom2,
                 ## could use zi = ff(NULL) to include all FE predictors
                 ##  as ZI predictors as well ...
                 zi = ~1,
                 data = dd)

## logistic
lme4::glmer(ff("cm"),
            family = binomial,
            data = dd)

相关问题