我有以下文字:
Master of the universe\n\n(Jul 26, 2023 - 1:00pm)\n\n(Interviewee: Marina)\n\n\n\n(00:00:05 - 00:00:09)\n\n\t Alice: This project. Uh my job is to ask lots of questions.\n\n\n\n(00:00:10 - 00:00:11)\n\n\t Marina: What is it?\n\n\n\n(00:00:11 - 00:00:14)\n\n\t Alice: Uh uh impartially.\n\n\n\n(00:00:15 - 00:00:18)\n\n\t Alice: Uh so suddenly I don't work for a particular brand.\n\n\n\n(00:00:19 - 00:00:21)\n\n\t Alice: Uh I'm self-employed,\n\n\n\n(00:00:21 - 00:00:21)\n\n\t Marina: M M.\n\n\n\n(00:00:21 - 00:00:32)\n\n\t Alice: I do group interviews with lots of brands, from toothpaste to the product we're going to talk about today.\n\n\n\n(00:00:32 - 00:00:32)\n\n\t Marina: Okay.\n\n\n\n(00:00:33 - 00:00:37)\n\n\t Alice: Uh today we're gonna talk for an hour uh.\n\n\n\n(00:00:36 - 00:00:36)\n\n\t Marina: Okay.\n\n\n\n(00:00:37 - 00:00:39)\n\n\t
从上面的文本中,我想提取name: text
。例如:
Alice: This project. Uh my job is to ask lots of questions.
Marina: What is it?
Alice: Uh uh impartially.
Alice: Uh so suddenly I don't work for a particular brand.
Alice: Uh I'm self-employed,
Marina: M M.
Alice: I do group interviews with lots of brands, from toothpaste to the product we're going to talk about today.
Marina: Okay.
Alice: Uh today we're gonna talk for an hour uh.
Marina: Okay.
我能够从这个正则表达式代码中识别时间戳,但不能排除它们:
(?:[\\n]+\(\d{2}:\d{2}:\d{2} - \d{2}:\d{2}:\d{2}\)[\\n\\t\\s]+|$)
我需要一个正则表达式模式,可以排除所有的时间戳和其他文本,只保留name: text
如上所示。
编辑: 我忘了说,排除与受访者姓名匹配的行。*
P.S:我不希望Python代码使用上面的模式进行正则表达式替换。我只是一个完整的模式,以找到匹配name: text
2条答案
按热度按时间pgky5nke1#
测试代码:https://ideone.com/egOlTP
我会使用
re
来做类似的事情这会打印出你要找的图案。希望这对你有帮助。
输出:x1c 0d1x
如果您不希望任何其他(如受访者-显示在输出屏幕上)
替换模式
与此
c0vxltue2#
name: text
如上所示。..."*您可以使用以下模式。
您需要首先捕获此值。
下面是一个例子,其中 s 是提供的文本。
输出