dplyr在向量上而不是在R中的 Dataframe 上过滤

dohp0rv5 于 2023-03-05 发布在其他

关注(0)|答案(4)|浏览(182)

这看起来是个简单的问题，但我还没有找到一个清晰的解决方案。我在R中有一个向量，我想从向量中删除某些元素，但出于各种原因，我想避免使用vector[vector！=“thiselement”]符号。特别是，下面是我尝试做的事情：

# this doesnt work
all_states = gsub(" ", "-", tolower(state.name)) %>% filter("alaska")

# this doesnt work either
all_states = gsub(" ", "-", tolower(state.name)) %>% filter(!= "alaska")

# this does work but i want to avoid this approach to filtering
all_states = gsub(" ", "-", tolower(state.name))
all_states = all_states[all_states != "alaska"]

这能用简单的方式完成吗？2预先感谢您的帮助！
EDIT -我之所以纠结于此，是因为我只能在网上找到关于基于 Dataframe 列的过滤的东西，例如：

my_df %>% filter(col != "alaska")

但是我这里使用的是向量而不是 Dataframe

来源：https://stackoverflow.com/questions/44169164/dplyr-filter-on-a-vector-rather-than-a-dataframe-in-r

4条答案

按热度按时间

vs3odd8k1#

- 更新**

正如@r_31415在评论中提到的，像stringr这样的包提供了可以更好地解决这个问题的函数。
使用str_subset(string, pattern, negate=FALSE)，可以过滤字符向量，如

library(stringr)

# Strings that have at least one character that is neither "A" nor "B".
> c("AB", "BA", "ab", "CA") %>% str_subset("[^AB]")
[1] "ab" "CA"

# Strings that do not include characters "A" or "B".
> c("AB", "BA", "ab", "CA") %>% str_subset("[AB]", negate=TRUE)
[1] "ab"

默认情况下，pattern被解释为正则表达式。因此，要搜索包含特殊字符（如(、*和?）的文本模式，可以使用修饰符函数fixed(literal_string)将模式字符串括起来，而不是使用double-backslash escape或raw-string since R 4.0.0

# escape special character with "\\" (has to escape `\` with itself in a string literal).
> c("(123.5)", "12345") %>% str_subset("\\(123\\.5\\)")
[1] "(123.5)"

# R 4.0.0 supports raw-string, which is handy for regex strings
> c("(123.5)", "12345") %>% str_subset(r"{\(123\.5\)}")
[1] "(123.5)"

# use the fixed() modifier
> c("(123.5)", "12345") %>% str_subset(fixed("(123.5)"))
[1] "(123.5)"

## unexpected results if without escaping or the "fixed()" modifier
> c("(123.5)", "12345") %>% str_subset("(123.5)")
[1] "(123.5)" "12345"

- 原始答案**

很抱歉发布了一个5个月前的问题，以存档一个更简单的解决方案。
包dplyr可以通过以下方式过滤字符向量：

> c("A", "B", "C", "D") %>% .[matches("[^AB]", vars=.)]
[1] "C" "D"
> c("A", "B", "C", "D") %>% .[.!="A"]
[1] "B" "C" "D"

第一种方法允许你使用正则表达式过滤，第二种方法使用较少的单词，因为包dplyr导入包magrittr，尽管屏蔽了它的函数，如extract，但没有屏蔽占位符.。
占位符.的详细信息可以在forward-pipe operator %>%的帮助中找到，此占位符主要有三种用法：

将圆点用于次要目的
使用%〉%的lambda表达式
使用点占位符作为lhs

在这里，我们利用它的第三个用途。

赞(0）回复(0）举报 2023-03-05

t2a7ltrp2#

你也许想试试。

> library(magrittr)

> c("A", "B", "C", "D") %>% extract(.!="A")
[1] "B" "C" "D"

要获取更多类似extract的函数，请加载magrittr包并键入?alises。

赞(0）回复(0）举报 2023-03-05

zzlelutf3#

可以肯定dplyr只在data.frame上运行，这里有一个两行代码的例子，将向量强制到data.frame，然后再返回。

myDf = data.frame(states = gsub(" ", "-", tolower(state.name))) %>% filter(states != "alaska")
all_states = myDf$states

或者是一个粗略的班轮：

all_states = (data.frame(states = gsub(" ", "-", tolower(state.name))) %>% filter(states != "alaska"))$states

赞(0）回复(0）举报 2023-03-05

efzxgjgh4#

在tidyverse中得到想要的结果的一个简单方法是将向量放入tibble中，然后取出向量。

tibble(myvec = gsub(" ", "-", tolower(state.name))) %>% 
   filter(myvec != "alaska") %>% pull(myvec)

具有所需输出：[1]“亚拉巴马“亚利桑那”“阿肯色州”“加州”“科罗拉多”......

赞(0）回复(0）举报 2023-03-05

我来回答

dplyr在向量上而不是在R中的 Dataframe 上过滤

4条答案

相关问题

热门标签

最新问答