r:如何使用多个不同的信息对group_by()中基于类别的输入进行变异?

a64a0gku  于 2023-10-13  发布在  其他
关注(0)|答案(2)|浏览(99)

假设我有:

> df
  RECNUM diag.letter diag.number C_DIAG
1      A           S           6  DS060
2      A           T          15  DT151
3      A           S           6 DS061A
4      B           S           6  DS064
5      C           S           6  DS061
6      C           S           6  DS066
7      D           S           2  DS020
8      D           S           2  DS021

我试图根据这些信息来定义不同类型的创伤。我想创建df$trauma,它有三个级别:“Iso.TBI”、“Multi.TBI”和“Multitrauma”。
我在dplyr中寻找解决方案
标准是:
(1)group_by(RECNUM)或类似。
(2)如果df$RECNUM有多个行(多个唯一RECNUM),并且所有这些行都包含diag.letter == "S"diag.number == 6的组合,或者包含多个以下字符:C_DIAG %in% c("DS020", "DS021", "DS029", "DS071"),然后多.TBI
(3)else ifdiag.letter包含“T”-无论有多少行,则Multi. traffic
(4)否则等TBI
这里是预期输出

> df
  RECNUM diag.letter diag.number C_DIAG      trauma
1      A           S           6  DS060 Multitrauma
2      A           T          15  DT151 Multitrauma
3      A           S           6 DS061A Multitrauma
4      B           S           6  DS064     Iso.TBI
5      C           S           6  DS061   Multi.TBI
6      C           S           6  DS066   Multi.TBI
7      D           S           2  DS020   Multi.TBI
8      D           S           2  DS021   Multi.TBI

数据

df <- data.frame(
   RECNUM = c("A", "A", "A", "B", "C", "C", "D", "D"),
   diag.letter = c("S", "T", "S", "S", "S", "S", "S", "S"),
   diag.number = c(6, 15, 6, 6, 6, 6, 2, 2),
   C_DIAG = c("DS060", "DT151", "DS061A", "DS064", "DS061", "DS066", "DS020", "DS021")
 )
5sxhfpxr

5sxhfpxr1#

这是dplyr::mutate()dplyr::case_when的一个很好的候选-注意case_when一个接一个地计算,所以顺序很重要。这首先使用grepl查找C_DIAG中的任何“T”,因为这将覆盖任何其他逻辑。然后,在任何C_DIAG中没有“T”的那些中,它执行您在(2)和(4)中描述的其余逻辑(注意:n()提供RECNUM的行数):

library(dplyr)

df %>%
  mutate(trauma = case_when(
    any(grepl("T", diag.letter)) ~ "Multitrauma",
    n() > 1 & all(diag.letter == "S") & diag.number == 6 ~ "Multi.TBI",
    C_DIAG %in% c("DS020", "DS021", "DS029", "DS071") ~ "Multi.TBI",
    TRUE ~ "Iso.TBI"
  ),
    .by = RECNUM
)

输出量:

RECNUM diag.letter diag.number C_DIAG      trauma
1      A           S           6  DS060 Multitrauma
2      A           T          15  DT151 Multitrauma
3      A           S           6 DS061A Multitrauma
4      B           S           6  DS064     Iso.TBI
5      C           S           6  DS061   Multi.TBI
6      C           S           6  DS066   Multi.TBI
7      D           S           2  DS020   Multi.TBI
8      D           S           2  DS021   Multi.TBI
yyhrrdl8

yyhrrdl82#

不确定所有嵌套是否正确,但这与您的结果匹配:

c_diag_vals = c("DS020", "DS021", "DS029", "DS071")

df |>
  mutate(trauma = case_when(
    ## more than one row AND
    n() > 1 & (
      (  ## all rows have diag.letter "S" and diag.number 6
        all(diag.letter == "S") & all(diag.number == 6)  
      ) |
        ## or there is more than one unique value from c_diag_vals vector
      length(setdiff(c_diag_vals, C_DIAG)) < (length(c_diag_vals) - 1)
    ) ~ "Multi.TBI",
    ## any diag.letter is "T"
    any(diag.letter == "T") ~ "Multi.trauma",
    ## else
    .default = "Iso.TBI"
  ), .by = RECNUM)

#   RECNUM diag.letter diag.number C_DIAG       trauma
# 1      A           S           6  DS060 Multi.trauma
# 2      A           T          15  DT151 Multi.trauma
# 3      A           S           6 DS061A Multi.trauma
# 4      B           S           6  DS064      Iso.TBI
# 5      C           S           6  DS061    Multi.TBI
# 6      C           S           6  DS066    Multi.TBI
# 7      D           S           2  DS020    Multi.TBI
# 8      D           S           2  DS021    Multi.TBI

相关问题