R -从数据框中删除与列表元素(完全)不匹配的行

jv4diomz 于 2022-12-06 发布在其他

关注(0)|答案(3)|浏览(180)

想象一个 Dataframe ...

df <- rbind("A*YOU 1.000 0.780", "A*YOUR 1.000 0.780", "B*USE 0.800 0.678", "B*USER 0.700 1.000")
df <- as.data.frame(df)
df

...打印...

> df
                  V1
1  A*YOU 1.000 0.780
2 A*YOUR 1.000 0.780
3  B*USE 0.800 0.678
4 B*USER 0.700 1.000

......并且我想删除其中不包含列表（这里称为tenables）tenables <- c("A*YOU", "B*USE")中任何元素的任何行，因此结果变为：

> df
                  V1
1  A*YOU 1.000 0.780
2  B*USE 0.800 0.678

有什么解决的办法吗？非常感谢。

来源：https://stackoverflow.com/questions/74657825/r-remove-rows-from-data-frame-that-do-not-match-exactly-elements-of-list

3条答案

按热度按时间

9o685dep1#

因为在tenables中有正则表达式特殊项（*的意思是“0个或多个前面的字符/类/组”），我们不能在grep调用中使用fixed=TRUE。因此，我们需要找到这些特殊字符并对其进行反斜杠转义。从那里，我们将添加\\b（字边界），以区分YOU和YOUR，其中添加空格或任何其它字符可能是过度约束的。

## clean up tenables to be regex-friendly and precise
gsub("([].*+(){}[])", "\\\\\\1", tenables)
# [1] "A\\*YOU" "B\\*USE"

## combine into a single pattern for simple use in grep
paste0("\\b(", paste(gsub("([].*+(){}[])", "\\\\\\1", tenables), collapse = "|"), ")\\b")
# [1] "\\b(A\\*YOU|B\\*USE)\\b"

## subset your frame
subset(df, !grepl(paste0("\\b(", paste(gsub("([].*+(){}[])", "\\\\\\1", tenables), collapse = "|"), ")\\b"), V1))
#                   V1
# 2 A*YOUR 1.000 0.780
# 4 B*USER 0.700 1.000

正则表达式解释：

\\b(A\\*YOU|B\\*USE)\\b
^^^                 ^^^  "word boundary", meaning the previous/next chars
                         are begin/end of string or from A-Z, a-z, 0-9, or _
   ^               ^     parens "group" the pattern so we can reference it
                         in the replacement string
    ^^^^^^^              literal "A", "*", "Y", "O", "U" (same with other string)
           ^             the "|" means "OR", so either the "A*" or the "B*" strings

赞(0）回复(0）举报 2022-12-06

mxg2im7a2#

> df[gsub("\\s*\\d+\\.*", "", df$V1) %in% tenables, ,drop=FALSE]
                 V1
1 A*YOU 1.000 0.780
3 B*USE 0.800 0.678

赞(0）回复(0）举报 2022-12-06

yhxst69z3#

一种方法是在 df 的strsplit列上使用sapply，分别只查看A*YOU 1.000 0.780的第一个条目。

df[sapply(strsplit(df$V1, " "), function(x) 
  any(grepl(x[1], tenables))), , drop=F]
                 V1
2 A*YOU 1.000 0.780
4 B*USE 0.800 0.678

赞(0）回复(0）举报 2022-12-06

我来回答

R -从数据框中删除与列表元素(完全)不匹配的行

3条答案

相关问题

热门标签

最新问答