R语言 如何使用循环或函数来简化我的程序?

btxsgosb  于 2023-05-04  发布在  其他
关注(0)|答案(2)|浏览(113)

这是我的代码:

## Extract different phecode class
#1 Subset for circulatory system
circulatory_system <- per_class %>% filter(exclude_name == "circulatory system")

#2 Subset for dermatologic
dermatologic <- per_class %>% filter(exclude_name == "dermatologic")

#3 Subset for endocrine/metabolic
endocrine_metabolic <- per_class %>% filter(exclude_name == "endocrine/metabolic")

#4 Subset for genitourinary
genitourinary <- per_class %>% filter(exclude_name == "genitourinary")

#5 Subset for infectious diseases
infectious_diseases <- per_class %>% filter(exclude_name == "infectious diseases")

## List all phecode class
data_list <- list(
  circulatory_system = list(df = circulatory_system, exclude_name = "circulatory system"),
  dermatologic = list(df = dermatologic, exclude_name = "dermatologic"),
  endocrine_metabolic = list(df = endocrine_metabolic, exclude_name = "endocrine/metabolic"),
  genitourinary = list(df = genitourinary, exclude_name = "genitourinary"),
  infectious_diseases = list(df = infectious_diseases, exclude_name = "infectious diseases"),
)

我想问一下,是否有更简化的方法来制作相同格式的data_list?因为我的phecode类有15个以上,所以看起来很乱。
谢谢大家。

f8rj6qna

f8rj6qna1#

有多种方法可以实现这一点,但如果您可以控制data_list所使用的实际数据结构,我建议使用tidyr::nest(),因为它更简洁。
tidyr::nest()的输出将是一个包含两列的tibble,一列是“拉出”的列(在本例中为exclude_name),df是包含过滤后的 Dataframe 的列表列(减去拉出的列)。
下面是使用iris数据集和Species列的所有三个选项的示例。

data(iris)

# Option 1: Same format, but using purrr
all_species <- unique(iris$Species)
data_list1 <- purrr::set_names(all_species) |>
  purrr::map(\(species) list(
    df = iris |> dplyr::filter(Species == species),
    Species = species
  ))

# Option 2: Same format, but using base R
data_list2 <- split(iris, ~Species) |>
  lapply(\(split) list(df = split, Species = split$Species[[1L]]))

bench::mark(
  purr = purrr::set_names(all_species) |>
    purrr::map(\(species) list(
      df = iris |> dplyr::filter(Species == species),
      Species = species
    )),
  base = split(iris, ~Species) |>
    lapply(\(split) list(df = `row.names<-`(split, NULL), Species = split$Species[[1L]]))
)
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 purr         2.27ms   2.31ms      432.    37.9KB     12.6
#> 2 base       244.62µs 252.38µs     3946.    23.5KB     19.1

# Option 3: Slightly different format using tidyr
data_nested <- iris |>
  tidyr::nest(df = -Species)

data_nested
#> # A tibble: 3 × 2
#>   Species    df               
#>   <fct>      <list>           
#> 1 setosa     <tibble [50 × 4]>
#> 2 versicolor <tibble [50 × 4]>
#> 3 virginica  <tibble [50 × 4]>

bench::mark(
  base = tibble::tibble(
    Species = all_species,
    df = unname(lapply(split(iris, ~Species), \(x) tibble::as_tibble(x[, -5L])))
  ),
  tidyr = iris |>
    tidyr::nest(df = -Species)
)
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 base         1.18ms   1.21ms      825.   150.1KB     14.7
#> 2 tidyr        4.79ms   4.83ms      205.    32.5KB     13.1

创建于2023-04-30使用reprex v2.0.2

46qrfjad

46qrfjad2#

考虑base R的bytapply的面向对象 Package 器),它返回一个用于单因子拆分的特殊命名列表:

data_list <- by(
  per_class, 
  per_class$exclude_name, 
  FUN = \(sub) list(df=sub, exclude_name=sub$exclude_name[1])
)

要调整下划线的空格和特殊字符的名称,请执行以下操作:

names(data_list) <- gsub(" |/", "_", names(data_list))

相关问题