regex 如何删除R字符串中的内括号？

gojuced7 于 2022-11-26 发布在其他

关注(0)|答案(4)|浏览(161)

我正在处理R中的字符串，它应该包含零个或一对括号。如果有嵌套的括号，我需要删除内部的括号对。下面是一个例子，我需要删除 big bent nachos 周围的括号，但不删除其他/外部的括号。

test <- c(
  "Record ID", 
  "What is the best food? (choice=Nachos)", 
  "What is the best food? (choice=Tacos (big bent nachos))", 
  "What is the best food? (choice=Chips with stuff)", 
  "Complete?"
)

我知道我可以使用str_remove_all()删除stringr包中的所有括号：

test |>
  stringr::str_remove_all(stringr::fixed(")")) |> 
  stringr::str_remove_all(stringr::fixed("("))

但我没有RegEx技巧来选择内括号。我找到了一个SO post that is close，但它删除了外括号，我无法解开它来删除内括号。

regex

来源：https://stackoverflow.com/questions/74525811/how-can-i-remove-inner-parentheses-from-an-r-string

4条答案

按热度按时间

dldeef671#

给你的，谢谢
第一个

编辑

修正了我的解决方案，以便不丢失文本：
第一次

赞(0）回复(0）举报 2022-11-26

bxfogqkk2#

由于对如何在外部圆括号中使用多个(... )来解决这个问题很感兴趣，我提出了以下基于lookahead的想法。

test <- gsub("\\(([^)(]*)\\)(?=[^)(]*(?:\\([^)(]*\\)[^)(]*)*\\))", "\\1", test, perl=T)

请参阅www.example.com上的R演示tio.run或pattern demo at regex101（替换为 * 第一组 * 的\1、capture）
lookahead会在每个(... )处验证后面是否只跟有(.... )或最多不跟有)得括号.
如果存在任意嵌套，则可以通过recursive regex来解决第一层的平坦化。

test <- gsub("(?:\\G(?!^)|\\()[^)(]*+\\K(\\(((?>[^)(]+|(?1))*)\\))", "\\2", test, perl=T)

www.example.com上的另一个R演示tio.run或regex101 demo（替换为\2，* 第二组 * 捕获）
| 正则部分|已解释|
| - -|- -|
| (?:\G(?!^)|\()|匹配chaining matches to by use of \G的左括号|
| [^)(]*+\K个|使用非括号的any amount，并且\K重置开头|
| (\(((?>[^)(]+|(?1))*)\))|匹配嵌套括号（explanation at php.net ↗）。它包含两个capture groups：·first 在(?1)处递归·“第二个”捕捉到“第一个”里面的“第一个”。|
这里的匹配项被链接到左括号中。没有检查外部的结束)。这个基于\G的想法也可以用于without recursion，但效率稍低。

赞(0）回复(0）举报 2022-11-26

5t7ly7z53#

假设最多有一个嵌套圆括号，我们可以使用gsub()方法：

output <- gsub("\\(\\s*(.*?)\\s*\\(.*?\\)(.*?)\\s*\\)", "(\\1\\2)", test)
output

[1] "Record ID"                                       
[2] "What is the best food? (choice=Nachos)"          
[3] "What is the best food? (choice=Tacos)"           
[4] "What is the best food? (choice=Chips with stuff)"
[5] "Complete?"

数据：

test <- c(
  "Record ID", 
  "What is the best food? (choice=Nachos)", 
  "What is the best food? (choice=Tacos (big bent nachos))", 
  "What is the best food? (choice=Chips with stuff)", 
  "Complete?"
)

赞(0）回复(0）举报 2022-11-26

hsgswve44#

下面是一个使用基于R的gsub的解决方案。为了提高可读性和调试效率，它分为两个步骤。

test <- c(
   "Record ID", 
   "What is the best food? (choice=Nachos)", 
   "What is the best food? (choice=Tacos (big bent nachos))", 
   "What is the best food? (choice=Chips with stuff)", 
   "Complete?"
) 

test <- gsub("(\\(.*)\\(", "\\1", test)
# ( \\(.*  ) - first group starts with '(' then zero or more characters following that first '('
#  \\(       - middle part look of a another '('

#  "\\1" replace the found group with the part from the first group

test <-gsub("\\)(.*\\))", "\\1", test)
#similer to first part
test

[1] "Record ID"                                            
[2] "What is the best food? (choice=Nachos)"               
[3] "What is the best food? (choice=Tacos big bent nachos)"
[4] "What is the best food? (choice=Chips with stuff)"     
[5] "Complete?"

赞(0）回复(0）举报 2022-11-26

我来回答

regex 如何删除R字符串中的内括号？

4条答案

编辑

相关问题

热门标签

最新问答