regex 如何删除R字符串中的内括号?

gojuced7  于 2022-11-26  发布在  其他
关注(0)|答案(4)|浏览(161)

我正在处理R中的字符串,它应该包含零个或一对括号。如果有嵌套的括号,我需要删除内部的括号对。下面是一个例子,我需要删除 big bent nachos 周围的括号,但不删除其他/外部的括号。

test <- c(
  "Record ID", 
  "What is the best food? (choice=Nachos)", 
  "What is the best food? (choice=Tacos (big bent nachos))", 
  "What is the best food? (choice=Chips with stuff)", 
  "Complete?"
)

我知道我可以使用str_remove_all()删除stringr包中的所有括号:

test |>
  stringr::str_remove_all(stringr::fixed(")")) |> 
  stringr::str_remove_all(stringr::fixed("("))

但我没有RegEx技巧来选择内括号。我找到了一个SO post that is close,但它删除了外括号,我无法解开它来删除内括号。

dldeef67

dldeef671#

给你的,谢谢
第一个

编辑

修正了我的解决方案,以便不丢失文本:
第一次

bxfogqkk

bxfogqkk2#

由于对如何在外部圆括号中使用多个(... )来解决这个问题很感兴趣,我提出了以下基于lookahead的想法。

test <- gsub("\\(([^)(]*)\\)(?=[^)(]*(?:\\([^)(]*\\)[^)(]*)*\\))", "\\1", test, perl=T)

请参阅www.example.com上的R演示tio.run或pattern demo at regex101(替换为 * 第一组 * 的\1capture
lookahead会在每个(... )处验证后面是否只跟有(.... )或最多不跟有)得括号.
如果存在任意嵌套,则可以通过recursive regex来解决第一层的平坦化。

test <- gsub("(?:\\G(?!^)|\\()[^)(]*+\\K(\\(((?>[^)(]+|(?1))*)\\))", "\\2", test, perl=T)

www.example.com上的另一个R演示tio.run或regex101 demo(替换为\2,* 第二组 * 捕获)
| 正则部分|已解释|
| - -|- -|
| (?:\G(?!^)|\()|匹配chaining matches to by use of \G的左括号|
| [^)(]*+\K个|使用非括号的any amount,并且\K重置开头|
| (\(((?>[^)(]+|(?1))*)\))|匹配嵌套括号(explanation at php.net ↗)。它包含两个capture groups:·first(?1)处递归·“第二个”捕捉到“第一个”里面的“第一个”。|
这里的匹配项被链接到左括号中。没有检查外部的结束)。这个基于\G的想法也可以用于without recursion,但效率稍低。

5t7ly7z5

5t7ly7z53#

假设最多有一个嵌套圆括号,我们可以使用gsub()方法:

output <- gsub("\\(\\s*(.*?)\\s*\\(.*?\\)(.*?)\\s*\\)", "(\\1\\2)", test)
output

[1] "Record ID"                                       
[2] "What is the best food? (choice=Nachos)"          
[3] "What is the best food? (choice=Tacos)"           
[4] "What is the best food? (choice=Chips with stuff)"
[5] "Complete?"

数据:

test <- c(
  "Record ID", 
  "What is the best food? (choice=Nachos)", 
  "What is the best food? (choice=Tacos (big bent nachos))", 
  "What is the best food? (choice=Chips with stuff)", 
  "Complete?"
)
hsgswve4

hsgswve44#

下面是一个使用基于R的gsub的解决方案。为了提高可读性和调试效率,它分为两个步骤。

test <- c(
   "Record ID", 
   "What is the best food? (choice=Nachos)", 
   "What is the best food? (choice=Tacos (big bent nachos))", 
   "What is the best food? (choice=Chips with stuff)", 
   "Complete?"
) 

test <- gsub("(\\(.*)\\(", "\\1", test)
# ( \\(.*  ) - first group starts with '(' then zero or more characters following that first '('
#  \\(       - middle part look of a another '('

#  "\\1" replace the found group with the part from the first group

test <-gsub("\\)(.*\\))", "\\1", test)
#similer to first part
test

[1] "Record ID"                                            
[2] "What is the best food? (choice=Nachos)"               
[3] "What is the best food? (choice=Tacos big bent nachos)"
[4] "What is the best food? (choice=Chips with stuff)"     
[5] "Complete?"

相关问题