R：生存期分析的重构数据

我正在使用R编程语言。
我有以下关于内科病人的数据：

my_data = data.frame(id = c(1,2,3), status_2017 = c("alive", "alive", "alive"), status_2018 = c("alive", "dead", "alive"), status_2019 = c("alive", "dead", "dead"), height_2017 = rnorm(3,3,3), height_2018 = rnorm(3,3,3), 
                     height_2019 = rnorm(3,3,3) , weight_2017  = rnorm(3,3,3), weight_2018 = rnorm(3,3,3), weight_2019 = rnorm(3,3,3))

cols <- colnames(my_data)
ix <- my_data[, startsWith(cols, "status")] == "dead"

my_data[, startsWith(cols, "height")][ ix ] <- NA
my_data[, startsWith(cols, "weight")][ ix ] <- NA

这看起来像这样：

id status_2017 status_2018 status_2019 height_2017 height_2018 height_2019 weight_2017 weight_2018 weight_2019
1  1       alive       alive       alive   3.7276706    4.524869   -1.648458   -1.702781    7.755581    3.369895
2  2       alive        dead        dead   0.7539518          NA          NA    1.060408          NA          NA
3  3       alive       alive        dead   6.6213771    2.122374          NA    5.114120    1.851467          NA

- 我的问题：**我希望重新构造此数据，以便：
每位患者每年都有自己的行
有一个"年份"列
状态_2017、状态_2018、状态_2019全部合并为一列（即"状态"）
Height_2017、Height_2018、Height_2019全部合并为一列（即"高度"）
权重_2017、权重_2018、权重_2019全部合并为一列（即"权重"）
创建一个新变量（"new_var"），如果患者id有一行为2019，则new_var始终为0-对于所有其他患者id，new_var为0，直到最大年份（然后new_var为1）

我试着用下面的代码来完成这个任务：

library(dplyr)
library(tidyr)

my_data_long <- na.omit(my_data %>%
    pivot_longer(cols = -c(id, status_2017),
                 names_to = c(".value", "year"),
                 names_pattern = "(height|weight)_(\\d{4})") %>%
    arrange(id, year))

final = my_data_long  %>%
  group_by(id) %>%
  mutate(
    new_var = ifelse(any(year == "2019"), 0, 1),
    max_year = max(year)
  ) %>%
  ungroup() %>%
  mutate(
    new_var = ifelse(year == max_year & new_var == 1, 1, 0),
    max_year = NULL
  )

最终结果如下所示：

> final
# A tibble: 6 x 6
     id status_2017 year  height weight new_var
  <dbl> <chr>       <chr>  <dbl>  <dbl>   <dbl>
1     1 alive       2017   2.39    2.27       0
2     1 alive       2018  -0.541   1.63       0
3     1 alive       2019  -1.93   10.1        0
4     2 alive       2017   4.18   -3.35       1
5     3 alive       2017  -1.35    7.12       0
6     3 alive       2018   1.42    1.70       1

我的最终目标是重新构造这个数据集，以便我可以将"时变生存分析模型"（例如cox-ph）拟合到这个数据（例如https://atm.amegroups.com/article/view/18820/html，https://cran.r-project.org/web/pacacages/survival/vignettes/timedep.pdf）

- 有人能告诉我我做得对不对吗**

谢谢!

注意：我尝试为每个ID添加时差

这看起来像这样：

library(stringr)

final %>%
  group_by(id) %>%
  mutate(start = 0:(n() - 1),
         end = 1:n()) %>%
  ungroup()

# A tibble: 6 x 8
     id status_2017 year  height weight new_var start   end
  <dbl> <chr>       <chr>  <dbl>  <dbl>   <dbl> <int> <int>
1     1 alive       2017   2.39    2.27       0     0     1
2     1 alive       2018  -0.541   1.63       0     1     2
3     1 alive       2019  -1.93   10.1        0     2     3
4     2 alive       2017   4.18   -3.35       1     0     1
5     3 alive       2017  -1.35    7.12       0     0     1
6     3 alive       2018   1.42    1.70       1     1     2

如果我们需要status列，则必须将这些列也包含在旋转到long中，即cols = -c(id, status_2017)从整形中删除"status_2017"。此外，除了height和weight之外，names_pattern还需要包含status

library(dplyr) # version >= 1.1.0
library(tidyr)
my_data %>%
  pivot_longer(cols = -id, names_to = c(".value", "year"),
   names_pattern = "(height|weight|status)_(\\d{4})") %>%
  drop_na() %>% 
 mutate(new_var = +(2019 %in% year), max_year = max(year), .by = "id") %>% 
 mutate(new_var = +(year == max_year & new_var), max_year = NULL)

输出

# A tibble: 6 × 6
     id year  status height weight new_var
  <dbl> <chr> <chr>   <dbl>  <dbl>   <int>
1     1 2017  alive   9.54   7.47        0
2     1 2018  alive   6.49   5.23        0
3     1 2019  alive   3.75   1.93        1
4     2 2017  alive   4.21   0.619       0
5     3 2017  alive   1.97   5.32        0
6     3 2018  alive  -0.406  8.00        0

R：生存期分析的重构数据

1条答案

相关问题

热门标签

最新问答