我有以下字符串:
<<John Smith, Youtube>>
I'm having a great day today
<<Jane Doe, Google>>
I'm going to the gym later
<<Speaker>>
Time for people to speak
<<Beff Jezos>>
Buy something from my online shop. You might like it
你可以通过下面的脚本命令在python中加载这个字符串:
string = '''<<John Smith, Youtube>>
I'm having a great day today
<<Jane Doe, Google>>
I'm going to the gym later
<<Speaker>>
Time for people to speak
<<Beff Jezos>>
Buy something from my online shop. You might like it'''
加载python包:
import re
我试图找到一种方法来捕获以下信息:我想提取从**<<开始到下一个<<**的所有文本之间的信息
例如,这意味着提取以下字符串的方式:
string1: John Smith, Youtube>>
I'm having a great day today
string2: Jane Doe, Google>>
I'm going to the gym later
string3: Speaker>>
Time for people to speak
string4: Beff Jezos>>
Buy something from my online shop. You might like it
输出可以是一个列表或一个带有键值对的命名字典,标签< >之间的值< and >是标识符,但并不总是唯一的,有些会重复。
感谢任何帮助-当前的正则表达式已经让我走了这么远:/(?=<<)(.*)(?=<<)/gm
新字符串:
Welcome to the first meeting today between Yotube, Google and Amazon Special guest speaker today is Beff Jezos
<<John Smith, Youtube>> I'm having a great day today
<<Jane Doe, Google>> I'm going to the gym later
<<Speaker>> Time for people to speak
<<Beff Jezos>> Buy something from my online shop. You might like it
2条答案
按热度按时间czq61nw11#
这是否给了你想要的结果?
那么他们做了什么呢?:
<<(.*?)>>
捕获<<
和>>
之间的内容\s*
是关于空格字符的(.*?)
捕获>>
和<<
之后的内容。编辑:Tim的回答更简单,解释得更好。
chy5wohz2#
我们可以做一个正则表达式查找所有搜索如下:
这里使用的正则表达式模式表示匹配:
<<
(.*?)
匹配并捕获所有内容,直到到达最近的(?=\s*<<|$)
可选空格,后跟<<
下一个标记的开始或输入的结束请注意,我们在dotall模式下执行正则表达式搜索,如
re.S
标志所示,因此.*
将跨行匹配。