R语言检查字符串是否以字符串列表中的任何字符串开头

1l5u6lss 于 2023-06-19 发布在其他

关注(0)|答案(5)|浏览(150)

我有一个词干列表，例如：

stems <- c("fri", "odd", "inspi")

我想看看一个单词是否以这些词干中的任何一个开头，然后返回该词干。例如，"fright"以"fri"开头，所以我想返回"fri"。
另一方面，虽然"todd"包含"odd"，但它不是以"odd"开头的，所以我不想返回任何东西。
有没有办法做到这一点？我试过str_starts()，其中的模式参数是一个列表，但似乎不起作用。

我的数据不存在重复问题。

举一个简单的例子，如果我的数据看起来像：

dat <- tibble(complete_word = c("fright", "todd", "quirky", "oddly"))

我想返回：

dat <- tibble(complete_word = c("fright", "todd", "quirky", "oddly"),
stem <- c("fri", NA, NA, "odd"))

来源：https://stackoverflow.com/questions/76419166/check-if-string-starts-with-any-string-in-list-of-strings

5条答案

按热度按时间

6ju8rftf1#

从 base 中，你可以在sapply中使用startsWith，并将 * 词干 * 与max.col中的索引一起使用。

. <- sapply(c(stems, ""), startsWith, x=dat$complete_word)
c(stems, NA)[max.col(., "first")]
#[1] "fri" NA    NA    "odd"

#Alternative thanks to @jay.sf
. <- vapply(c(stems, ""), startsWith, x=dat$complete_word, logical(nrow(dat)))
c(stems, NA)[max.col(., "first")]

使用管道也是一样。

sapply(c(stems, ""), startsWith, x=dat$complete_word) |>
max.col("first") |>
(`[`)(c(stems, NA), i=_)
#[1] "fri" NA    NA    "odd"

赞(0）回复(0）举报 2023-06-19

cedebl8k2#

这里有一种使用tidyverse的方法。使用map和str_starts从向量stem中获取匹配索引（如果有的话）。

library(dplyr)
library(purrr)

dat %>% 
  mutate(idx = map(complete_word, ~ which(str_starts(.x, stems) == 1)), 
         stem = stems[as.integer(idx)])

结果：

# A tibble: 4 × 3
  complete_word idx       stem 
  <chr>         <list>    <chr>
1 fright        <int [1]> fri  
2 todd          <int [0]> NA   
3 quirky        <int [0]> NA   
4 oddly         <int [1]> odd

赞(0）回复(0）举报 2023-06-19

ikfrs5lh3#

您需要迭代这些单词，为每个单词创建一个匹配stems的逻辑向量，为每个单词将stems与该向量进行匹配，然后用NA替换空字符串。

library(tidyverse)

stems <- c("fri","odd","inspi")
dat <- c("fright","todd","quirky","oddly")

dat %>% 
  map(\(x){str_starts(x, stems)}) %>% 
  map_chr(\(x){ifelse(any(x), stems[x], NA_character_)})
#> [1] "fri" NA    NA    "odd"

创建于2023-06-06带有reprex v2.0.2

赞(0）回复(0）举报 2023-06-19

qojgxg4l4#

使用regexpr/regmatches方法。

f <- \(x, st) {
  p <- paste0('^', st, collapse='|')  ## gives e.g. "^fri|^odd|^inspi"
  m <- regexpr(p, text=x)
  replace(rep.int(NA_character_, length(x)), m > 0L, regmatches(x, m))
}

f(dat$complete_word, stems)
# [1] "fri" NA    NA    "odd"

transform(dat, stem=f(complete_word, stems))
#   complete_word stem
# 1        fright  fri
# 2          todd <NA>
# 3        quirky <NA>
# 4         oddly  odd

数据：*

dat <- structure(list(complete_word = c("fright", "todd", "quirky", 
"oddly")), class = "data.frame", row.names = c(NA, -4L))

stems <- c("fri", "odd", "inspi")

赞(0）回复(0）举报 2023-06-19

uemypmqf5#

我们可以试试

dat %>%
    mutate(stem = {
        m <- outer(complete_word, stems, Vectorize(startsWith))
        stems[rowSums(m * col(m)) * NA^(rowSums(m) == 0)]
    })

它给出了

# A tibble: 4 × 2
  complete_word stem
  <chr>         <chr>
1 fright        fri
2 todd          NA
3 quirky        NA
4 oddly         odd

赞(0）回复(0）举报 2023-06-19

我来回答

R语言检查字符串是否以字符串列表中的任何字符串开头

我的数据不存在重复问题。

5条答案

相关问题

热门标签

最新问答

R语言 检查字符串是否以字符串列表中的任何字符串开头

我的数据不存在重复问题。

5条答案

相关问题

热门标签

最新问答

R语言检查字符串是否以字符串列表中的任何字符串开头