regex 用于从文本文件中搜索字符串的正则表达式

zpgglvta 于 2023-01-14 发布在其他

关注(0)|答案(4)|浏览(106)

我写了下面的代码从一个文本文件的一个特定行中提取两个值。我的文本文件有多行信息，我试图找到下面的行

2022-05-03 11:15:09.395 [6489266] | (rtcp_receiver.cc:823): BwMgr Received a TMMBR with bps: 1751856

我正在从上面的行提取时间（11：15：09）和带宽（1751856

import re
import matplotlib.pyplot as plt
import sys

time =[]
bandwidth = []
myfile = open(sys.argv[1])
for line in myfile:
    line = line.rstrip()
    if re.findall('TMMBR with bps:',line):
        time.append(line[12:19])
        bandwidth.append(line[-7:])

plt.plot(time,bandwidth)
plt.xlabel('time')
plt.ylabel('bandwidth')   
plt.title('TMMBR against time')
plt.legend()
plt.show()

这里的问题是我给出了绝对索引值（第[12：19]行）来提取数据，如果这行有一些额外的字符或任何额外的空格，这些数据就不起作用了。我可以用什么正则表达式来提取这些值呢？我是RE的新手

regex

来源：https://stackoverflow.com/questions/75099798/regular-expression-to-search-string-from-a-text-file

4条答案

按热度按时间

ztigrdn81#

试试这个：

(?:\d+:\d+:|(?<=TMMBR with bps: ))\d+

(?:\d+:\d+:|(?<=TMMBR with bps: ))非捕获基团。
\d+:一个或多个数字，后跟冒号:。
\d+:一个或多个数字，后跟冒号:。
|或
(?<=TMMBR with bps: )是前面有句子TMMBR with bps:的位置。
\d+一个或多个数字。

参见regex demo

import re

txt1 = '2022-05-03 11:15:09.395 [6489266] | (rtcp_receiver.cc:823): BwMgr Received a TMMBR with bps: 1751856'

res = re.findall(r'(?:\d+:\d+:|(?<=TMMBR with bps: ))\d+', txt1)

print(res[0]) #Output: 11:15:09

print(res[1]) #Output: 1751856

赞(0）回复(0）举报 2023-01-14

tct7dpnv2#

您可以对2个捕获组使用更具体一点的方法：

(\d\d:\d\d:\d\d)\.\d{3}\b.*\bTMMBR with bps:\s*(\d+)\b

- 说明**
(\d\d:\d\d:\d\d)捕获组1，匹配类似格式的时间
\.\d{3}\b匹配一个点和3个数字
.*匹配行的其余部分
\bTMMBR with bps:\s*字边界，匹配TMMBR with bps:和可选空白字符
(\d+)捕获组2，匹配1个或多个数字
\b字边界

请参见regex101 demo和Python demo。
示例

import re

s = r"2022-05-03 11:15:09.395 [6489266] | (rtcp_receiver.cc:823): BwMgr Received a TMMBR with bps: 1751856"
pattern = r"(\d\d:\d\d:\d\d)\.\d{3}\b.*\bTMMBR with bps:\s*(\d+)\b"
m = re.search(pattern, s)
if m:
    print(m.groups())

产出

('11:15:09', '1751856')

赞(0）回复(0）举报 2023-01-14

nfg76nw03#

您可以只使用拆分：

BPS_SEPARATOR = "TMMBR with bps: "
for line in strings:
    line = line.rstrip()
    if BPS_SEPARATOR in line:
        time.append(line.split(" ")[1])
        bandwidth.append(line.split(BPS_SEPARATOR)[1])

赞(0）回复(0）举报 2023-01-14

u91tlkcl4#

使用上下文管理器处理文件
不要仅仅使用re.findall来检查字符串中模式的出现;效率不高。对于正则表达式情况，请改用re.search

在您的情况下，拆分一条生产线并获得所需的部件就足够了：

with open(sys.argv[1]) as myfile:
    ...
    if 'TMMBR with bps:' in line:
        parts = line.split()
        time.append(parts[1][:-4])
        bandwidth.append(parts[-1])

赞(0）回复(0）举报 2023-01-14

我来回答

regex 用于从文本文件中搜索字符串的正则表达式

4条答案

相关问题

热门标签

最新问答