将事件级数据集转换为r中的患者级数据

6yjfywim  于 2023-07-31  发布在  其他
关注(0)|答案(3)|浏览(116)

我需要将事件级数据集转换为患者级数据,即基于deidnum作为关键变量,将长数据集转换为更宽的数据集。此外,我还想为每个生成的事件及其事件时间创建列。如果同一患者发生多起事件,则考虑最早的事件时间。
下面是一个类似的数据示例和我的代码:

df <- read.table(text = "deidnum,eventc,EVENTDT,MI_COMPLICATED
325107,MI,21,1
325107,New Rose Dyspnea Scale 2 or more,1468,NA
418351,New Rose Dyspnea Scale 2 or more,207,NA
839172,New Rose Dyspnea Scale 2 or more,1060,NA
839172,New Rose Dyspnea Scale 2 or more,1718,NA
1487422,MI,990,0
1487422,DEATH,1113,NA
1511165,MI,424,0
1511165,MI,608,1
1511165,New Rose Dyspnea Scale 2 or more,721,NA
", sep = ",", header = TRUE)

library(reshape2)
wide.df <- dcast(df, deidnum ~ eventc)
wide.df

字符串

当前输出

deidnum DEATH MI New Rose Dyspnea Scale 2 or more
1  325107     0  1                                1
2  418351     0  0                                1
3  839172     0  0                                2
4 1487422     1  1                                0
5 1511165     0  2                                1

预期输出:

任何建议将不胜感激。

v440hwme

v440hwme1#

tidyverse工作流:

library(tidyr)
library(dplyr)

df %>%
  slice_min(EVENTDT, by = c(deidnum, eventc)) %>%
  pivot_wider(id_cols = deidnum, names_from = eventc,
              values_from = c(eventc, EVENTDT),
              values_fn = list(eventc = length),
              values_fill = list(eventc = 0),
              unused_fn = first) %>%
  rename_with(~ sub("eventc_", "", .x), starts_with("eventc"))

# # A tibble: 5 × 8
#   deidnum    MI `New Rose Dyspnea Scale 2 or more` DEATH EVENTDT_MI `EVENTDT_New Rose Dyspnea Scale 2 or more` EVENTDT_DEATH MI_COMPLICATED
#     <int> <int>                              <int> <int>      <int>                                      <int>         <int>          <int>
# 1  325107     1                                  1     0         21                                       1468            NA              1
# 2  418351     0                                  1     0         NA                                        207            NA             NA
# 3  839172     0                                  1     0         NA                                       1060            NA             NA
# 4 1487422     1                                  0     1        990                                         NA          1113              0
# 5 1511165     1                                  1     0        424                                        721            NA              0

字符串

  • 注意:**unused_fn = first用于按id_cols列(deidnum)分组,然后使用first()汇总未使用的列(MI_COMPLICATED)(假设已按EVENTDT排序)。*
u5rb5r59

u5rb5r592#

merge使用基本reshape调用。

reshape2::dcast(df, deidnum ~ eventc, value.var='MI_COMPLICATED', fun=length) |>
  merge(reshape(df, idvar='deidnum', timevar='eventc', direction='wide')) |>
  suppressWarnings()  ## warns for more than one event which is acc. to OP fine
#   deidnum DEATH MI New Rose Dyspnea Scale 2 or more EVENTDT.MI MI_COMPLICATED.MI
# 1  325107     0  1                                1         21                 1
# 2  418351     0  0                                1         NA                NA
# 3  839172     0  0                                2         NA                NA
# 4 1487422     1  1                                0        990                 0
# 5 1511165     0  2                                1        424                 0
#   EVENTDT.New Rose Dyspnea Scale 2 or more MI_COMPLICATED.New Rose Dyspnea Scale 2 or more
# 1                                     1468                                              NA
# 2                                      207                                              NA
# 3                                     1060                                              NA
# 4                                       NA                                              NA
# 5                                      721                                              NA
#   EVENTDT.DEATH MI_COMPLICATED.DEATH
# 1            NA                   NA
# 2            NA                   NA
# 3            NA                   NA
# 4          1113                   NA
# 5            NA                   NA

字符串

ohfgkhjo

ohfgkhjo3#

这里有一个新的tidyverse方法:

library(dplyr)
library(tidyr)

df %>%
  select(deidnum, eventc) %>%
  summarise(n = n(), .by = c(deidnum, eventc)) %>%
  pivot_wider(names_from = eventc, values_from = n, names_prefix = "", values_fill = 0) %>% 
  left_join(df %>%
              group_by(deidnum, eventc) %>%
              filter(EVENTDT == min(EVENTDT)) %>%
              ungroup() %>%
              pivot_wider(names_from = eventc, 
                          values_from = c(EVENTDT, MI_COMPLICATED), 
                          names_sep = "_") %>%
              arrange(deidnum) %>% 
              select(1:5), by = "deidnum"
            )
deidnum    MI `New Rose Dyspnea Scale 2 or more` DEATH EVENTDT_MI `EVENTDT_New Rose Dyspnea Scale 2 or more` EVENTDT_DEATH MI_COMPLICATED_MI
    <int> <int>                              <int> <int>      <int>                                      <int>         <int>             <int>
1  325107     1                                  1     0         21                                       1468            NA                 1
2  418351     0                                  1     0         NA                                        207            NA                NA
3  839172     0                                  2     0         NA                                       1060            NA                NA
4 1487422     1                                  0     1        990                                         NA          1113                 0
5 1511165     2                                  1     0        424                                        721            NA                 0

相关问题