regex 在Python中使用正则表达式将文本拆分为speaker和text?

n1bvdmb6  于 2023-08-08  发布在  Python
关注(0)|答案(1)|浏览(77)

我有以下字符串:

JOHN SMITH, GLOBAL HEAD OF YOUTUBE : Good morning, good 
afternoon, everyone . Before I hand over to facebook, I want to give a quick reminder of the reporting 
changes that have taken effect this filming of a tv show.  
 

 
BOBBY DUDE, GROUP FROM FACEBOOK:     Thanks, john smith lets talk about movies and films we watch when we are bored parents.

字符串
我如何创建一个正则表达式模式来将文本拆分为speaker和text?例如,为了得到这个结果:

string1: (speaker = JOHN SMITH, GLOBAL HEAD OF YOUTUBE, text  =  Good morning, good 
afternoon, everyone . Before I hand over to facebook, I want to give a quick reminder of the reporting 
changes that have taken effect this filming of a tv show.  
 )


等等

xmakbtuz

xmakbtuz1#

你可以试试(regex101):

import re

text = """\
JOHN SMITH, GLOBAL HEAD OF YOUTUBE : Good morning, good
afternoon, everyone . Before I hand over to facebook, I want to give a quick reminder of the reporting
changes that have taken effect this filming of a tv show.


BOBBY DUDE, GROUP FROM FACEBOOK:     Thanks, john smith lets talk about movies and films we watch when we are bored parents. """

out = re.findall(
    r"^([^a-z:]+?)\s*:\s*(.*?)\s*(?=^[^a-z:]+?:|\Z)", text, flags=re.S | re.M
)

print(out)

字符串
印刷品:

[
    (
        "JOHN SMITH, GLOBAL HEAD OF YOUTUBE",
        "Good morning, good\nafternoon, everyone . Before I hand over to facebook, I want to give a quick reminder of the reporting\nchanges that have taken effect this filming of a tv show.",
    ),
    (
        "BOBBY DUDE, GROUP FROM FACEBOOK",
        "Thanks, john smith lets talk about movies and films we watch when we are bored parents.",
    ),
]

相关问题