如何从现有的两列创建一个新列,但忽略R中的NAs行

mkshixfv  于 2023-01-18  发布在  其他
关注(0)|答案(3)|浏览(299)

我有一个 Dataframe ,其中一部分看起来像这样:

Domain <- c(rep("Bacteria",3),rep("Archaea", 2))
Phylum <- c("Proteobacteria","Cyanobacteria","Planctomycetota", "Thermoplasmatota", "Thermoplasmatota")
Class <- c("Alphaproteobacteria","Cyanobacteriia","Phycisphaerae","Poseidoniia_A",NA)
Order <- c("Sphingomonadales", NA, "Phycisphaerales", "Poseidoniales", NA)
Family <- c("Emcibacteraceae", NA, NA, "Poseidonia", NA)
Genus <- c("UBA4441", NA,NA,NA,NA)
Species <- c("UBA4441 sp", NA,NA,NA,NA)

demo_table <- data.frame(Domain, Phylum, Class, Order, Family, Genus, Species)

这里的要点是,我想创建一个名为“assignation”的新列,该列包含对最后两列的合并,这两列包含非NA值,并且这些值用空格分隔。
以下是预期输出:
| 领域|门|类别|订单|家庭|属|种属|赋值|
| - ------|- ------|- ------|- ------|- ------|- ------|- ------|- ------|
| 细菌|变形菌门|α变形菌门|鞘氨醇单胞菌目|分枝杆菌科|乌巴4441|UBA 4441菌株|UBA 4441标准品|
| 细菌|蓝细菌|蓝细菌|不适用|不适用|不适用|不适用|蓝细菌|
| 细菌|浮菌门|斑扇类|石楠目|不适用|不适用|不适用|小球藻目|
| 古细菌|热原体目|波塞冬_A|波塞冬目|波塞冬尼亚|不适用|不适用|波塞冬目|
| 古细菌|热原体目|不适用|不适用|不适用|不适用|不适用|热原体古菌|
我认为paste()可以在这种情况下工作,但不确定如何实现它,以便我可以得到上述预期的输出 Dataframe 。

bis0qfac

bis0qfac1#

我们可以在行上使用base R-loop,用na.omit移除NA,用n = 2paste获得最后两个元素tail

demo_table$assignation <- apply(demo_table, 1, 
   function(x) paste(tail(na.omit(x), 2), collapse = " "))
  • 输出
demo_table$assignation
[1] "UBA4441 UBA4441 sp"            "Cyanobacteria Cyanobacteriia"  "Phycisphaerae Phycisphaerales" "Poseidoniales Poseidonia"     
[5] "Archaea Thermoplasmatota"

对于tidyverse,我们也可以使用unite,并使用na.rm = TRUE移除NA,然后提取最后两个字

library(dplyr)
library(tidyr)
library(stringr)
demo_table %>% 
  unite(assignation, everything(), na.rm = TRUE, remove = FALSE) %>% 
  mutate(assignation = str_replace(assignation,     
     ".*_([^_]+)_([^_]+)$", "\\1 \\2")) %>% 
  relocate(assignation, .after =last_col())
a0x5cqrl

a0x5cqrl2#

如果你想使用tidyverse方法,你只需要使用rowwise + c_across。我认为把这个操作转换成一个函数也很好,以防你以后需要使用或者甚至改变它的行为。
编号

library(dplyr)

select_last <- function(x, n = 2){paste(tail(na.omit(x),n = n),collapse = " ")}

demo_table %>% 
  rowwise() %>% 
  mutate(assignation  = select_last(c_across()))

输出

# A tibble: 5 x 8
# Rowwise: 
  Domain  Phylum      Class        Order       Family     Genus Species  assignation        
  <chr>   <chr>       <chr>        <chr>       <chr>      <chr> <chr>    <chr>              
1 Bacter~ Proteobact~ Alphaproteo~ Sphingomon~ Emcibacte~ UBA4~ UBA4441~ UBA4441 UBA4441 sp 
2 Bacter~ Cyanobacte~ Cyanobacter~ NA          NA         NA    NA       Cyanobacteria Cyan~
3 Bacter~ Planctomyc~ Phycisphaer~ Phycisphae~ NA         NA    NA       Phycisphaerae Phyc~
4 Archaea Thermoplas~ Poseidoniia~ Poseidonia~ Poseidonia NA    NA       Poseidoniales Pose~
5 Archaea Thermoplas~ NA           NA          NA         NA    NA       Archaea Thermoplas~
uinbv5nw

uinbv5nw3#

以下是dplyrtidyr结合使用的方法:

library(dplyr)
library(tidyr)
demo_table %>% 
  mutate(id = row_number()) %>% 
  pivot_longer(-id) %>% 
  group_by(id) %>% 
  na.omit() %>% 
  arrange(-row_number(), .by_group = TRUE) %>% 
  mutate(assignation = paste(value[1], value[2], sep = "\n")) %>% 
  slice(1) %>% 
  ungroup() %>% 
  select(assignation) %>% 
  bind_cols(demo_table) %>% 
  View()

相关问题