regex 正则表达式模式排除时间戳

fhity93d  于 2023-10-22  发布在  其他
关注(0)|答案(2)|浏览(157)

我有以下文字:

Master of the universe\n\n(Jul 26, 2023 - 1:00pm)\n\n(Interviewee: Marina)\n\n\n\n(00:00:05 - 00:00:09)\n\n\t Alice: This project. Uh my job is to ask lots of questions.\n\n\n\n(00:00:10 - 00:00:11)\n\n\t Marina: What is it?\n\n\n\n(00:00:11 - 00:00:14)\n\n\t Alice: Uh uh impartially.\n\n\n\n(00:00:15 - 00:00:18)\n\n\t Alice: Uh so suddenly I don't work for a particular brand.\n\n\n\n(00:00:19 - 00:00:21)\n\n\t Alice: Uh I'm self-employed,\n\n\n\n(00:00:21 - 00:00:21)\n\n\t Marina: M M.\n\n\n\n(00:00:21 - 00:00:32)\n\n\t Alice: I do group interviews with lots of brands, from toothpaste to the product we're going to talk about today.\n\n\n\n(00:00:32 - 00:00:32)\n\n\t Marina: Okay.\n\n\n\n(00:00:33 - 00:00:37)\n\n\t Alice: Uh today we're gonna talk for an hour uh.\n\n\n\n(00:00:36 - 00:00:36)\n\n\t Marina: Okay.\n\n\n\n(00:00:37 - 00:00:39)\n\n\t

从上面的文本中,我想提取name: text。例如:

Alice: This project. Uh my job is to ask lots of questions.
Marina: What is it?
Alice: Uh uh impartially.
Alice: Uh so suddenly I don't work for a particular brand.
Alice: Uh I'm self-employed,
Marina: M M.
Alice: I do group interviews with lots of brands, from toothpaste to the product we're going to talk about today.
Marina: Okay.
Alice: Uh today we're gonna talk for an hour uh.
Marina: Okay.

我能够从这个正则表达式代码中识别时间戳,但不能排除它们:

(?:[\\n]+\(\d{2}:\d{2}:\d{2} - \d{2}:\d{2}:\d{2}\)[\\n\\t\\s]+|$)

我需要一个正则表达式模式,可以排除所有的时间戳和其他文本,只保留name: text如上所示。

编辑: 我忘了说,排除与受访者姓名匹配的行。*
P.S:我不希望Python代码使用上面的模式进行正则表达式替换。我只是一个完整的模式,以找到匹配name: text

pgky5nke

pgky5nke1#

测试代码:https://ideone.com/egOlTP
我会使用re来做类似的事情

pattern = r'(\w+): (.+)'

matches = re.findall(pattern, input_text)

for match in matches:
    name, text = match
    print(f"{name}: {text}")

这会打印出你要找的图案。希望这对你有帮助。
输出:x1c 0d1x

如果您不希望任何其他(如受访者-显示在输出屏幕上)

替换模式

pattern = r'(\w+): (.+)'

与此

pattern = r' (\w+): (.+)'
c0vxltue

c0vxltue2#

  • ".我需要一个正则表达式模式,可以排除所有的时间戳和其他文本,只保留name: text如上所示。..."*

您可以使用以下模式。

\t (.+?): *(.+?)(?:\n)+\(\d
  • ".排除与受访者姓名匹配的行。..."*

您需要首先捕获此值。

\(Interviewee: (.+?)\)

下面是一个例子,其中 s 是提供的文本。

name = re.search(r'\(Interviewee: (.+?)\)', s).group(1)
d = []
for m in re.finditer(r'\t (.+?): *(.+?)(?:\n)+\(\d', s):
    if m.group(1) != name:
        d.append({m.group(1): m.group(2)})

输出

{'Alice': 'This project. Uh my job is to ask lots of questions.'}
{'Alice': 'Uh uh impartially.'}
{'Alice': "Uh so suddenly I don't work for a particular brand."}
{'Alice': "Uh I'm self-employed,"}
{'Alice': "I do group interviews with lots of brands, from toothpaste to the product we're going to talk about today."}
{'Alice': "Uh today we're gonna talk for an hour uh."}

相关问题