使用purrr map迭代数据框列表,提取列并创建新数据框

wz3gfoph  于 2023-03-05  发布在  其他
关注(0)|答案(2)|浏览(133)

我正在练习使用purrr包,但仍然不够。我尝试使用purrrmap()函数来迭代数据框列表,并选择特定列来创建新的数据框。我尝试了以下代码,但似乎不起作用:

library(tidyverse)
    
df1 <- data.frame(x = 1:5, col1 = letters[1:5])
df2 <- data.frame(x = 5:10, col2 = letters[5:10])
df3 <- data.frame(x = 10:15, col3 = letters[10:15])

list_dataframes <- list(df1, df2, df3)

# would like to do something like this
new_dataframe <- map_df(list_dataframes, ~select(., c("col1", "col2", "col3")))
#> Error in `map()`:
#> i In index: 1.
#> Caused by error in `select()`:
#> ! Can't subset columns that don't exist.
#> x Column `col2` doesn't exist.

#> Backtrace:
#>      x
#>   1. +-purrr::map_df(list_dataframes, ~select(., c("col1", "col2", "col3")))
#>   2. | \-purrr::map(.x, .f, ...)
#>   3. |   \-purrr:::map_("list", .x, .f, ..., .progress = .progress)
#>   4. |     +-purrr:::with_indexed_errors(...)
#>   5. |     | \-base::withCallingHandlers(...)
#>   6. |     +-purrr:::call_with_cleanup(...)
#>   7. |     \-global .f(.x[[i]], ...)
#>   8. |       +-dplyr::select(., c("col1", "col2", "col3"))
#>   9. |       \-dplyr:::select.data.frame(., c("col1", "col2", "col3"))
#>  10. |         \-tidyselect::eval_select(expr(c(...)), data = .data, error_call = error_call)
#>  11. |           \-tidyselect:::eval_select_impl(...)
#>  12. |             +-tidyselect:::with_subscript_errors(...)
#>  13. |             | \-rlang::try_fetch(...)
#>  14. |             |   \-base::withCallingHandlers(...)
#>  15. |             \-tidyselect:::vars_select_eval(...)
#>  16. |               \-tidyselect:::walk_data_tree(expr, data_mask, context_mask)
#>  17. |                 \-tidyselect:::eval_c(expr, data_mask, context_mask)
#>  18. |                   \-tidyselect:::reduce_sels(node, data_mask, context_mask, init = init)
#>  19. |                     \-tidyselect:::walk_data_tree(new, data_mask, context_mask)
#>  20. |                       \-tidyselect:::eval_c(expr, data_mask, context_mask)
#>  21. |                         \-tidyselect:::reduce_sels(node, data_mask, context_mask, init = init)
#>  22. |                           \-tidyselect:::walk_data_tree(new, data_mask, context_mask)
#>  23. |                             \-tidyselect:::as_indices_sel_impl(...)
#>  24. |                               \-tidyselect:::as_indices_impl(...)
#>  25. |                                 \-tidyselect:::chr_as_locations(x, vars, call = call, arg = arg)
#>  26. |                                   \-vctrs::vec_as_location(...)
#>  27. \-vctrs (local) `<fn>`()
#>  28.   \-vctrs:::stop_subscript_oob(...)
#>  29.     \-vctrs:::stop_subscript(...)
#>  30.       \-rlang::abort(...)

创建于2023年3月1日,使用reprex v2.0.2
LE:map_df(list_dataframes, ~select_if(., is.character))可以工作,但不能正确绑定列。
任何帮助或见解将不胜感激!

kyxcudwk

kyxcudwk1#

您可以像这样使用map_df(按行绑定

library(purrr)
library(dplyr)
map_df(list_dataframes, ~select(., any_of(c("col1", "col2", "col3"))))

   col1 col2 col3
1     a <NA> <NA>
2     b <NA> <NA>
3     c <NA> <NA>
4     d <NA> <NA>
5     e <NA> <NA>
6     f <NA> <NA>
7  <NA>    e <NA>
8  <NA>    f <NA>
9  <NA>    g <NA>
10 <NA>    h <NA>
11 <NA>    i <NA>
12 <NA>    j <NA>
13 <NA> <NA>    j
14 <NA> <NA>    k
15 <NA> <NA>    l
16 <NA> <NA>    m
17 <NA> <NA>    n
18 <NA> <NA>    o

或者,如果您的行在列表中具有完全相同的长度,那么您可以使用bind_cols(),(我将df1更改为df1 <- data.frame(x = 1:6, col1 = letters[1:6])

new_dataframe <- map(list_dataframes, ~select(., any_of(c("col1", "col2", "col3"))))
new_dataframe |> bind_cols()

  col1 col2 col3
1    a    e    j
2    b    f    k
3    c    g    l
4    d    h    m
5    e    i    n
6    f    j    o
btqmn9zl

btqmn9zl2#

我认为问题在于名称“col1”(或col2或col3)并不存在于所有data.tables中。

new_dataframe <- map_df(list_dataframes, ~{
return(.x) %>% select(starts_with("col"))
})

相关问题