处理REDCap多项选择复选框

35g0bw71  于 12个月前  发布在  其他
关注(0)|答案(1)|浏览(82)

因此,当我将我的REDCap数据导出到R Studio时,多项选择复选框的答案变成了它们自己的变量,而不是像在REDCap中那样保留一个变量。例如,关于种族的问题允许个人选择适用于他们的所有种族,但在数据中,种族变成了8个新变量来解释每个复选框选项。
这是我发现的唯一可行的解决方案,但它很耗时,需要大量的质量检查。所以,我很想找到一些更自动的方法来处理这个问题。

cleandata <- data %>% 
  mutate(race_cat = rowSums(data[,85:92]),    
        new_race = case_when(race_cat == 1 & race___5 == 1 & ethnicity != 1 ~ "White",
                             race_cat == 1 & race___2 == 1 & ethnicity != 1 ~ "Asian",
                             race_cat == 1 & race___3 == 1 & ethnicity != 1 ~ "Black",
                             race_cat == 1 & race___7 == 1 & ethnicity != 1 ~ "Other",
                             race_cat == 1 & race___1 == 1 & ethnicity != 1 ~ "AIAN",
                             race_cat == 1 & race___4 == 1 & ethnicity != 1 ~ "NHPI",
                             race_cat == 1 & race___8 == 1 & ethnicity != 1 ~ "Unknown",
                             race_cat == 1 & race___6 == 1 & ethnicity != 1 ~ "Multi-Racial",
                             race_cat == 0 & ethnicity != 1 ~ "Unknown",
                             ethnicity == 1 ~ "Hispanic",
                             race_cat > 1 ~ "Multi-Racial"))
  table(cleandata$new_race)

字符串
我尝试了一下REDCapR包,希望它能解决这个问题,但我还没有弄清楚。

qyyhg6bp

qyyhg6bp1#

这是一件很难概括的事情,因为不能保证race字段总是按照你定义的方式定义的。具体来说,如果下一个项目定义了race___1 = White而不是race___5怎么办?
您可能可以做一些事情来使这个转换更容易一些。
1.仅用一个选择为受试者应用标签
1.对有多个选择的受试者应用多种族标签
1.对未选择的受试者应用未知标签
1.为西班牙裔受试者贴上西班牙裔标签
我可以用这种方式来处理它:

categorize_race <- function(data,
                            output_name = "new_race",
                            race_map = list(White = "race___5"),
                            race_field = "race",
                            ethnicity_field = "ethnicity", 
                            ethnicity_hispanic = 1){
  # Identify fields with race categories
  regex <- sprintf("^%s___.+$", race_field)
  race_fields <- names(data)[grepl(regex, names(data))]
  
  # Get the sum of race categories for each record
  race_sum <- rowSums(data[race_fields])
  
  # Identify hispanic, multiracial, and unknown
  is_hispanic <- data[[ethnicity_field]] == ethnicity_hispanic
  is_multiracial <- race_sum > 1
  is_unknown <- race_sum == 0

  # Match the checkbox value to the label  
  RaceLabel <- data[race_fields]
  for (i in seq_along(race_map)){
    RaceLabel[[race_map[i]]] <- ifelse(RaceLabel[[race_map[i]]] == 1, 
                                       names(race_map)[i], 
                                       "")
  }
  
  # Collapse all of the labels into a single string
  # subjects with multiple labels will have a messy value, 
  # but that will eventually be overwritten to "Multi-Racial"
  new_race_value <- apply(RaceLabel, 1, paste0, collapse = "")
  new_race_value[is_hispanic] <- "Hispanic"
  new_race_value[is_multiracial] <- "Multi-Racial"
  new_race_value[is_unknown] <- "Unknown"
  
  data[[output_name]] <- new_race_value
  
  data
}

cleandata <- data %>% 
  categorize_race(race_map = list("White" = "race___5", 
                                  "Asian" = "race___2", 
                                  "Black" = "race___3", 
                                  "Other" = "race___7", 
                                  "AIAN" = "race___1", 
                                  "NHPI" = "race___4", 
                                  "Unknown" = "race___8", 
                                  "Multi-Racial" = "race___6"))

字符串
这个函数使您可以完全控制各种选项到标签的Map、存储的字段种族以及Map到西班牙语的编码。
我还没有测试过它,所以它可能需要一些调整,让它工作,以您的喜好。

相关问题