regex 捕获一个标签开始和另一个标签开始之间的文本？

eulz3vhy 于 2023-06-30 发布在其他

关注(0)|答案(2)|浏览(116)

我有以下字符串：

<<John Smith, Youtube>> 
I'm having a great day today 
<<Jane Doe, Google>> 
I'm going to the gym later 
<<Speaker>>
Time for people to speak 
<<Beff Jezos>> 
Buy something from my online shop. You might like it

你可以通过下面的脚本命令在python中加载这个字符串：

string = '''<<John Smith, Youtube>> 
I'm having a great day today 
<<Jane Doe, Google>> 
I'm going to the gym later 
<<Speaker>>
Time for people to speak 
<<Beff Jezos>> 
Buy something from my online shop. You might like it'''

加载python包：

import re

我试图找到一种方法来捕获以下信息：我想提取从**<<开始到下一个<<**的所有文本之间的信息
例如，这意味着提取以下字符串的方式：

string1: John Smith, Youtube>> 
I'm having a great day today 

string2: Jane Doe, Google>> 
I'm going to the gym later 

string3: Speaker>>
Time for people to speak 

string4: Beff Jezos>> 
Buy something from my online shop. You might like it

输出可以是一个列表或一个带有键值对的命名字典，标签< >之间的值< and >是标识符，但并不总是唯一的，有些会重复。
感谢任何帮助-当前的正则表达式已经让我走了这么远：/(?=<<)(.*)(?=<<)/gm
新字符串：

Welcome to the first meeting today between Yotube, Google and Amazon  Special guest speaker today is Beff Jezos 
<<John Smith, Youtube>>  I'm having a great day today  
<<Jane Doe, Google>>  I'm going to the gym later  
<<Speaker>> Time for people to speak  
<<Beff Jezos>>  Buy something from my online shop. You might like it

regex

来源：https://stackoverflow.com/questions/76582539/capture-text-between-the-start-of-one-tag-and-beginning-of-another

2条答案

按热度按时间

czq61nw11#

这是否给了你想要的结果？

import re

string = '''<<John Smith, Youtube>> 
I'm having a great day today 
<<Jane Doe, Google>> 
I'm going to the gym later 
<<Speaker>>
Time for people to speak 
<<Beff Jezos>> 
Buy something from my online shop. You might like it'''

pattern = r'<<(.*?)>>\s*(.*?)\s*(?=(?:<<|$))'
matches = re.findall(pattern, string, re.DOTALL)

result = []
for match in matches:
    identifier = match[0]
    content = match[1]
    result.append((identifier, content))

print(result)

那么他们做了什么呢？：
<<(.*?)>>捕获<<和>>之间的内容
\s*是关于空格字符的
(.*?)捕获>>和<<之后的内容。
编辑：Tim的回答更简单，解释得更好。

赞(0）回复(0）举报 2023-06-30

chy5wohz2#

我们可以做一个正则表达式查找所有搜索如下：

matches = re.findall(r'<<(.*?)(?=\s*<<|$)', string, flags=re.S)
print(matches)

["John Smith, Youtube>> \nI'm having a great day today",
 "Jane Doe, Google>> \nI'm going to the gym later",
 'Speaker>>\nTime for people to speak',
 'Beff Jezos>> \nBuy something from my online shop. You might like it']

这里使用的正则表达式模式表示匹配：

<<
(.*?)匹配并捕获所有内容，直到到达最近的
(?=\s*<<|$)可选空格，后跟<<下一个标记的开始或输入的结束

请注意，我们在dotall模式下执行正则表达式搜索，如re.S标志所示，因此.*将跨行匹配。

赞(0）回复(0）举报 2023-06-30

我来回答

regex 捕获一个标签开始和另一个标签开始之间的文本？

2条答案

相关问题

热门标签

最新问答