R语言 删除多列中NA的行

eulz3vhy  于 2024-01-03  发布在  其他
关注(0)|答案(4)|浏览(160)

关于删除包含NA的行drop_na(),有各种各样的问题,但我还没有找到一个解决我的特殊需要。我想删除包含NA的行中的每一个3特定列。在这种情况下,删除包含NA的行value2value3value4,但不删除包含NA的行少于所有三个,或在任何其它列中。

sample <- c("sample1", "sample2", "sample3", "sample4", "sample5")
value1<- c("A", "B", "A", NA, "C")
value2 <- c(NA, 5, 7, NA, NA)
value3 <- c(13, NA, 7, NA, NA)
value4 <- c(11, 4, NA, 9, NA)
myinput <- data.frame(sample, value1, value2, value3, value4)

myinput
   sample value1 value2 value3 value4
1 sample1      A     NA     13     11
2 sample2      B      5     NA      4
3 sample3      A      7      7     NA
4 sample4     NA     NA     NA      9
5 sample5      C     NA     NA     NA

字符串
所需输出:

sample value1 value2 value3 value4
1 sample1      A     NA     13     11
2 sample2      B      5     NA      4
3 sample3      A      7      7     NA
4 sample4     NA     NA     NA      9


谢谢你,谢谢

yzuktlbb

yzuktlbb1#

collapse::na_omit

collapse::na_omit(myinput, cols = paste0("value", 2:4), prop = 1)

#>    sample value1 value2 value3 value4
#> 1 sample1      A     NA     13     11
#> 2 sample2      B      5     NA      4
#> 3 sample3      A      7      7     NA
#> 4 sample4   <NA>     NA     NA      9

字符串

1,000,000 x 21数据集的基准:

collapse速度快5 - 6倍:

#Using data from https://stackoverflow.com/a/48830183/13460602
library(dplyr)
library(collapse)

mb <- microbenchmark::microbenchmark(
  rowSums = df[rowSums(is.na(df[, 2:21])) < 3,],
  "dplyr+rowSums" = df %>%
    filter(rowSums(is.na(pick(2:21))) < 3),
  if_all = df |>
    filter(!if_all(2:21, is.na)),
  collapse = na_omit(df, cols = 2:21, prop = 1)
)


x1c 0d1x的数据

w51jfk4q

w51jfk4q2#

我们可以使用is.narowSums来计算每行中NA s的数量。

base R

myinput[rowSums(is.na(myinput[, c("value2", "value3", "value4")])) < 3,]
#    sample value1 value2 value3 value4
# 1 sample1      A     NA     13     11
# 2 sample2      B      5     NA      4
# 3 sample3      A      7      7     NA
# 4 sample4   <NA>     NA     NA      9

字符串

下载

library(dplyr)
myinput %>%
  filter(rowSums(is.na(pick(value1, value2, value3))) < 3)
#    sample value1 value2 value3 value4
# 1 sample1      A     NA     13     11
# 2 sample2      B      5     NA      4
# 3 sample3      A      7      7     NA
# 4 sample5      C     NA     NA     NA

wkyowqbh

wkyowqbh3#

使用dplyr::if_all,您可以:

library(dplyr, warn = FALSE)

myinput |>
  filter(!if_all(value2:value4, is.na))
#>    sample value1 value2 value3 value4
#> 1 sample1      A     NA     13     11
#> 2 sample2      B      5     NA      4
#> 3 sample3      A      7      7     NA
#> 4 sample4   <NA>     NA     NA      9

字符串
创建于2023-12-18带有reprex v2.0.2

4dbbbstv

4dbbbstv4#

filter中使用across

library(dplyr)

myinput %>% 
  filter(rowSums(across(value2:value4, ~ is.na(.x))) < 3)
   sample value1 value2 value3 value4
1 sample1      A     NA     13     11
2 sample2      B      5     NA      4
3 sample3      A      7      7     NA
4 sample4   <NA>     NA     NA      9

字符串

相关问题