R语言从字符串中删除字符向量中的单词

g0czyy6m 于 2022-12-25 发布在其他

关注(0)|答案(3)|浏览(250)

我在R中有一个停用词的字符向量：

stopwords = c("a" ,
            "able" ,
            "about" ,
            "above" ,
            "abst" ,
            "accordance" ,
            ...
            "yourself" ,
            "yourselves" ,
            "you've" ,
            "z" ,
            "zero")

假设我有一个字符串：
第一个月
如何从str中删除已定义的停止词？
我认为gsub或另一个grep工具可能是实现这一目标的一个很好的候选工具，尽管其他建议也是受欢迎的。

来源：https://stackoverflow.com/questions/35790652/removing-words-featured-in-character-vector-from-string

3条答案

按热度按时间

waxmsbnn1#

试试这个：

str <- c("I have zero a accordance")

stopwords = c("a", "able", "about", "above", "abst", "accordance", "yourself",
"yourselves", "you've", "z", "zero")

x <- unlist(strsplit(str, " "))

x <- x[!x %in% stopwords]

paste(x, collapse = " ")

# [1] "I have"

- 添加：**编写"removeWords"函数很简单，因此不需要为此加载外部包：

removeWords <- function(str, stopwords) {
  x <- unlist(strsplit(str, " "))
  paste(x[!x %in% stopwords], collapse = " ")
}

removeWords(str, stopwords)
# [1] "I have"

赞(0）回复(0）举报 2022-12-25

k97glaaz2#

您可以使用tm库来执行此操作：

require("tm")
removeWords(str,stopwords)
#[1] "I have   "

赞(0）回复(0）举报 2022-12-25

xzabzqsa3#

如果你想让代码向量化很多句子，而不仅仅是一个句子，这里有另一个函数选项，它借用了Mikko最初答案的内容。

remove_words <- function(str, words) {
      
  purrr::map_chr(
    str, 
    function(sentence) {
      sentence_split <- unlist(strsplit(sentence, " "))
      paste(sentence_split[!sentence_split %in% words], collapse = " ")
    }
  )
      
}
    
remove_words(c('Hello world', 'This is another sentence', 'Test sentence 3'), c('world', 'sentence'))

赞(0）回复(0）举报 2022-12-25

我来回答

R语言从字符串中删除字符向量中的单词

3条答案

相关问题

热门标签

最新问答

R语言 从字符串中删除字符向量中的单词

3条答案

相关问题

热门标签

最新问答

R语言从字符串中删除字符向量中的单词