如何在R中创建一个从 Dataframe 列表中选择列的 Dataframe ?

fjaof16o  于 2023-05-20  发布在  其他
关注(0)|答案(2)|浏览(144)

我在一个名为df.list的对象中有许多 Dataframe 。下面是两个 Dataframe 的示例:

> dput(df.list$GF.1.GULUPA.1.F1)
structure(list(Evaluacion2 = c("5.1", "5.2"), Sistema = c(1, 
1), Fertilizacion = c("F1", "F1"), Asocio = c("GF", "GF"), Cultivo = c("GULUPA", 
"GULUPA"), Bloque = c(1, 1), `Altura de planta` = c(3.3, 5.1)), row.names = c(NA, 
-2L), class = c("tbl_df", "tbl", "data.frame"))

> dput(df.list$MF.1.MORA.1.F1)
structure(list(Evaluacion2 = c("1.1", "1.2", "3.1", "3.2", "4.1", 
"4.2", "5.1", "5.2"), Sistema = c(1, 1, 1, 1, 1, 1, 1, 1), Fertilizacion = c("F1", 
"F1", "F1", "F1", "F1", "F1", "F1", "F1"), Asocio = c("MF", "MF", 
"MF", "MF", "MF", "MF", "MF", "MF"), Cultivo = c("MORA", "MORA", 
"MORA", "MORA", "MORA", "MORA", "MORA", "MORA"), Bloque = c(1, 
1, 1, 1, 1, 1, 1, 1), `Altura de planta` = c(11.3, 11.7, 42, 
62, 57.1, 51.3, 90.1, 70.2)), row.names = c(NA, -8L), class = c("tbl_df", 
"tbl", "data.frame")))

我需要提取植物高度列的值,根据dataframe的名称标记。具体如下:

df.a<-select(df.list$GF.1.GULUPA.1.F1, c("Evaluacion2", "Altura de planta")) 
df.b<-select(df.list$MF.1.MORA.1.F1, c("Evaluacion2", "Altura de planta")) 
names(df.a)<-c("Evaluacion2", "GF.1.GULUPA.1.F1 Altura de planta") 
names(df.b)<- c("Evaluacion2", "MF.1.MORA.1.F1 Altura de planta") 
df.j<-df.a%>%full_join(df.b)

> dput(df.j)
structure(list(Evaluacion2 = c("5.1", "5.2", "1.1", "1.2", "3.1", 
"3.2", "4.1", "4.2"), `GF.1.GULUPA.1.F1 Altura de planta` = c(3.3, 
5.1, NA, NA, NA, NA, NA, NA), `MF.1.MORA.1.F1 Altura de planta` = c(90.1, 
70.2, 11.3, 11.7, 42, 62, 57.1, 51.3)), row.names = c(NA, -8L
), class = c("tbl_df", "tbl", "data.frame"))

我需要对df.list对象中的列表的所有 Dataframe 执行此操作。请帮帮我非常感谢!

wkftcu5l

wkftcu5l1#

library(purrr)
library(dplyr)

imap(df.list, ~ select(.x, Evaluacion2, `Altura de planta`) |>
       set_names(c("Evaluacion2", paste(.y, "Altura de planta")))) |>
  reduce(full_join)

如何运作

  1. imap遍历df.list,其中.x是列表元素(在本例中是 Dataframe ),.y是列表名称(例如,“GF.1.GULUPA.1.F1”)。
  2. select获取所需的列,然后set_names使用.y作为列表名称的前缀。
  3. reduce将遍历所有数据框列表,执行完全连接并输出单个数据框
    注意:如果您决定选择不止这两个列,一个更动态的选项将是,而不是为 all 列设置名称,只重命名您的目标列Altura de planta。因此,将set_names行替换为:
rename(!!sym(paste(.y, "Altura de planta")) := `Altura de planta`)

正如@M的评论中所提到的--您也可以直接在select语句中重命名,以获得更简洁的代码。

输出

Evaluacion2 `GF.1.GULUPA.1.F1 Altura de planta` `MF.1.MORA.1.F1 Altura de planta`
1 5.1                                         3.3                              90.1
2 5.2                                         5.1                              70.2
3 1.1                                        NA                                11.3
4 1.2                                        NA                                11.7
5 3.1                                        NA                                42  
6 3.2                                        NA                                62  
7 4.1                                        NA                                57.1
8 4.2                                        NA                                51.3

数据

df.list <- list(GF.1.GULUPA.1.F1 = structure(list(Evaluacion2 = c("5.1", 
"5.2"), Sistema = c(1, 1), Fertilizacion = c("F1", "F1"), Asocio = c("GF", 
"GF"), Cultivo = c("GULUPA", "GULUPA"), Bloque = c(1, 1), `Altura de planta` = c(3.3, 
5.1)), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"
)), MF.1.MORA.1.F1 = structure(list(Evaluacion2 = c("1.1", "1.2", 
"3.1", "3.2", "4.1", "4.2", "5.1", "5.2"), Sistema = c(1, 1, 
1, 1, 1, 1, 1, 1), Fertilizacion = c("F1", "F1", "F1", "F1", 
"F1", "F1", "F1", "F1"), Asocio = c("MF", "MF", "MF", "MF", "MF", 
"MF", "MF", "MF"), Cultivo = c("MORA", "MORA", "MORA", "MORA", 
"MORA", "MORA", "MORA", "MORA"), Bloque = c(1, 1, 1, 1, 1, 1, 
1, 1), `Altura de planta` = c(11.3, 11.7, 42, 62, 57.1, 51.3, 
90.1, 70.2)), row.names = c(NA, -8L), class = c("tbl_df", "tbl", 
"data.frame")))
0md85ypi

0md85ypi2#

你可以先使用purrr::list_rbind()来绑定 Dataframe ,它可以方便地将 Dataframe 名称存储在结果 Dataframe 的一列中。从那里开始转向边路。

library(dplyr)
df.list <- list(GF.1.GULUPA.1.F1 = GF.1.GULUPA.1.F1, 
                MF.1.MORA.1.F1 = MF.1.MORA.1.F1)
df.list %>% 
  purrr::list_rbind(names_to = "plant") %>% 
  select(-c(Sistema, Fertilizacion, Asocio, Cultivo, Bloque)) %>% 
  tidyr::pivot_wider(names_from = plant, 
              values_from = `Altura de planta`, 
              names_glue = "{plant} Altura de planta") 
#> # A tibble: 8 × 3
#>   Evaluacion2 `GF.1.GULUPA.1.F1 Altura de planta` MF.1.MORA.1.F1 Altura de pla…¹
#>   <chr>                                     <dbl>                          <dbl>
#> 1 5.1                                         3.3                           90.1
#> 2 5.2                                         5.1                           70.2
#> 3 1.1                                        NA                             11.3
#> 4 1.2                                        NA                             11.7
#> 5 3.1                                        NA                             42  
#> 6 3.2                                        NA                             62  
#> 7 4.1                                        NA                             57.1
#> 8 4.2                                        NA                             51.3
#> # ℹ abbreviated name: ¹​`MF.1.MORA.1.F1 Altura de planta`

输入数据:

GF.1.GULUPA.1.F1 <- structure(list(Evaluacion2 = c("5.1", "5.2"), Sistema = c(
  1,
  1
), Fertilizacion = c("F1", "F1"), Asocio = c("GF", "GF"), Cultivo = c(
  "GULUPA",
  "GULUPA"
), Bloque = c(1, 1), `Altura de planta` = c(3.3, 5.1)), row.names = c(
  NA,
  -2L
), class = c("tbl_df", "tbl", "data.frame"))

MF.1.MORA.1.F1 <- structure(list(Evaluacion2 = c(
  "1.1", "1.2", "3.1", "3.2", "4.1",
  "4.2", "5.1", "5.2"
), Sistema = c(1, 1, 1, 1, 1, 1, 1, 1), Fertilizacion = c(
  "F1",
  "F1", "F1", "F1", "F1", "F1", "F1", "F1"
), Asocio = c(
  "MF", "MF",
  "MF", "MF", "MF", "MF", "MF", "MF"
), Cultivo = c(
  "MORA", "MORA",
  "MORA", "MORA", "MORA", "MORA", "MORA", "MORA"
), Bloque = c(
  1,
  1, 1, 1, 1, 1, 1, 1
), `Altura de planta` = c(
  11.3, 11.7, 42,
  62, 57.1, 51.3, 90.1, 70.2
)), row.names = c(NA, -8L), class = c(
  "tbl_df",
  "tbl", "data.frame"
))

创建于2023-05-11带有reprex v2.0.2

相关问题