具有缺失列的R dplyr子集

hmae6n7t 于 2022-12-20 发布在其他

关注(0)|答案(3)|浏览(149)

我有下面的代码，并希望选择列到一个新的data.frame。

library(dplyr)
df = data.frame(
    Manhattan=c(1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0), 
    Brooklyn=c(0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0), 
    The_Bronx=c(1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0), 
    Staten_Island=c(0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0), 
    "2012"=c("P", "P", "P", "P", "P", "P", "P", "P", "P", "P", "Q", "Q", "Q", "Q", "Q", "Q", "Q", "Q", "Q"), 
    "2013"=c("P", "P", "P", "P", "P", "P", "P", "P", "Q", "Q", "P", "P", "P", "P", "Q", "Q", "Q", "Q", "Q"), 
    "2014"=c("P", "P", "P", "Q", "Q", "P", "P", "Q", "Q", "Q", "Q", "Q", "P", "Q", "P", "P", "P", "Q", "Q"), 
    "2015"=c("P", "P", "P", "P", "P", "Q", "Q", "Q", "P", "Q", "P", "P", "Q", "Q", "Q", "Q", "Q", "Q", "Q"), check.names=FALSE)
df2 <- subset(df, select = c("Manhattan", "Queens", "The_Bronx"))

这将抛出错误：

Error in [.data.frame`(x, r, vars, drop = drop) : 
   undefined columns selected

因为df中缺少列"Queens"，我怎样才能覆盖这个错误，使R继续创建只包含列"Manhattan"和"The_Bronx"的df2呢？
非常重要：我的真实数据有数百列，所以从命令df2 <- subset(df, select = c("Manhattan", "Queens", "The_Bronx"))中手动删除像"Queens"这样的列是不可行的（除非有什么窍门？）。有什么方法可以解决这个问题吗？谢谢。

来源：https://stackoverflow.com/questions/61152518/r-dplyr-subset-with-missing-columns

3条答案

按热度按时间

pkmbmrz71#

在基数R中，可以使用intersect只选择出现的名称。

cols <- c("Manhattan", "Queens", "The_Bronx")
subset(df, select = intersect(names(df), cols))

#   Manhattan The_Bronx
#1          1         1
#2          1         1
#3          0         0
#4          1         0
#5          1         0
#6          1         0
#7          1         0
#8          0         0
#...
#....

或者在dplyr中使用any_of：

library(dplyr)
df %>% select(tidyselect::any_of(cols))

赞(0）回复(0）举报 2022-12-20

wa7juj8i2#

我们还可以

cols <- c("Manhattan", "Queens", "The_Bronx")
library(dplyr)
df %>%
   select(matches(str_c(cols, collapse="|")))

赞(0）回复(0）举报 2022-12-20

8ulbf1ek3#

当前版本的dplyr支持将变量名的字符向量传入dplyr：：select（）的第二个参数，但建议将该向量封装在all_of（）中以减少歧义。

varnames <- c("mpg", "cyl", "carb")

以下两行都产生相同的输出：

dplyr::select(mtcars, varnames)
dplyr::select(mtcars, all_of(varnames))

输出：

mpg cyl carb
 Mazda RX4            21   6    4
 Mazda RX4 Wag        21   6    4
 Datsun 710           23   4    1
 Hornet 4 Drive       21   6    1

赞(0）回复(0）举报 2022-12-20

我来回答

具有缺失列的R dplyr子集

3条答案

相关问题

热门标签

最新问答