如何使用dplyr按组运行logistic回归?

v8wbuo2f  于 2023-03-27  发布在  其他
关注(0)|答案(1)|浏览(111)

我有一个树木数据库,其中包含了多年来不同树木在不同生长阶段(扩大,增厚和成熟)的细胞数量。1月1日将是DOY 1,1月2日将是DOY 2,等等)。我简化了它,像这样做一个可重复的例子:

df <- data.frame("Year" = c(2012, 2012, 2012, 2012, 2012, 2012, 2012,
                            2012, 2012, 2012, 2013, 2013, 2013,
                            2013, 2013, 2013, 2013, 2013, 2013, 2013),
                 "Tree" = c(15, 15, 15, 15, 15, 22, 22, 22, 22, 22, 41, 41,
                            41, 41, 41, 53, 53, 53, 53, 53),
                 "DOY" = c(65, 97, 125, 177, 214, 65, 97, 125, 177, 214,
                           61, 99, 118, 166, 221, 61, 99, 118, 166, 221),
                 "Enlarging" = c(0, 2, 4, 5, 0, 0, 3, 6, 3, 0, 0, 5, 4, 4, 0, 0, 4, 7, 5, 0),
                 "Thickening" = c(0, 0, 2, 4, 0, 0, 0, 4, 3, 0, 0, 0, 3, 2, 0, 0, 2, 4, 2, 0),
                 "Maturing" = c(0, 0, 3, 7, 0, 0, 0, 3, 4, 0, 0, 3, 6, 8, 0, 0, 0, 4, 7, 0))

df <- df %>%
  mutate(Year = as.factor(Year),
         Tree = as.factor(Tree),
         DOY = as.numeric(DOY),
         Enlarging = as.numeric(Enlarging),
         Maturing = as.numeric(Maturing))

print(df)
   Year Tree DOY Enlarging Thickening Maturing
1  2012   15  65         0          0        0
2  2012   15  97         2          0        0
3  2012   15 125         4          2        3
4  2012   15 177         5          4        7
5  2012   15 214         0          0        0
6  2012   22  65         0          0        0
7  2012   22  97         3          0        0
8  2012   22 125         6          4        3
9  2012   22 177         3          3        4
10 2012   22 214         0          0        0
11 2013   41  61         0          0        0
12 2013   41  99         5          0        3
13 2013   41 118         4          3        6
14 2013   41 166         4          2        8
15 2013   41 221         0          0        0
16 2013   53  61         0          0        0
17 2013   53  99         4          2        0
18 2013   53 118         7          4        4
19 2013   53 166         5          2        7
20 2013   53 221         0          0        0

我想在细胞的每个生长阶段nº和DOY之间应用单独的逻辑回归(对于放大,例如:放大~ DOY)为每一个不同的树,每年.我已经尝试了几件事,例如按年份和树分组,并应用逻辑回归为每个生长阶段,一个接一个:

df_enlarging <- df %>%
  select(Tree, Year, Enlarging)%>%
  group_by(Tree, Year)%>%
  mutate(the_glm = glm(Enlarging ~ DOY, family = "binomial", data = df),
         Fitted = predict(the_glm, type = "response"))

我还试着旋转我的数据,嵌套它(这样我就可以同时对每年的每棵树的三个生长阶段应用逻辑回归),然后做同样的事情,就像这样:

df_long <- df %>% 
  pivot_longer(Enlarging:Mature,
               names_to = 'Growth_Phase',
               values_to = 'Count') %>%
  ungroup()

df_nested <- df_long %>%
  nest_by(Year, Tree, as.factor(Growth_Phase)) #tried converting growth_phase to factor also

df_glm <- df_nested %>%
  rowwise() %>%
  mutate(the_glm = list(glm(Count ~ DOY, family = "binomial", data = data)),
         Fitted = list(predict(the_glm, type = "response")))

这一切都不起作用,在这两种情况下,我得到了相同的错误:Problem while computingthe_glm = glm(Enlarging ~ DOY, family = "binomial", data = data). Caused by error: ! y values must be 0 <= y <= 1`.有人知道我能做些什么来修复这个吗?非常感谢。

mpgws1up

mpgws1up1#

第一次每joran -是的,这将是最容易的,如果你改变你的数据为1/0。
我通过删除带有0的数据,然后“不计数”数据,将新行定义为1,然后将0数据添加回。之后,它是一个简单的分组,然后通过purrrMap模型,使用broom清理结果并预测数据集。

df %>% 
  filter(Enlarging==0) -> e0

df %>% 
  anti_join(e0) %>% 
  uncount(Enlarging) %>% 
  mutate(Enlarging = 1) %>% 
  bind_rows(e0) %>% 
  mutate(Year = as.factor(Year),
         Tree = as.factor(Tree),
         DOY = as.numeric(DOY),
         Enlarging = as.numeric(Enlarging),
         Maturing = as.numeric(Maturing)) %>% 
  group_nest(Tree, Year) %>% 
  mutate(the_glm = map(data, ~glm(Enlarging~DOY, family="binomial", data = .)),
         result = map(the_glm, broom::tidy),
         glance = map(the_glm, broom::glance),
         fits = map(the_glm, broom::augment, type.predict = "response")) %>% 
  unnest(fits)

相关问题