用于提取选定行的Regex函数

zphenhs4  于 2023-02-25  发布在  其他
关注(0)|答案(3)|浏览(243)

我有一个文本文件

Some text and random stuff that I don't need

2 8 
2 9 T
4 9
1 10 
2 10 F
7 11 T

More random stuff

我应该如何构造一个regex函数来提取只包含数字的行以及包含数字和T或F的行呢?

with open(file, 'r') as log_file:
    # opening file
        file = log_file
        while True:
            line = file.readlines()
            
            # if line in regex function:

                data.append(line)
                # closing file
                break

我该如何解决这个问题?

nhjlsmyf

nhjlsmyf1#

使用这种方法,re模式将只匹配数字或以字母T或F结尾的数字。您还可以使用for循环代替while循环。

import re

matched_data = []
with open(file, 'r') as log_file:
    data = log_file.readlines()
    
    for line in data:
        line = line.strip()
        if re.match(r'^\d+ \d+( [TF])?$', line):
            matched_data.append(line)
    
print(matched_data)

如果某些行以字母eg;T 7 11开头,并且您也希望匹配这些行,则应将上述模式替换为r'^[TF]|\d+ \d+( [TF])?$'

测试代码:

import re

data = """
2 8 
2 9 T
4 9
1 10 
2 10 F
7 11 T
5 B 37
Y 9 G
T 7 11
MG 99 Z
"""

data = data.splitlines()
matched_data = []
for line in data:
    line = line.strip()
    if re.match(r'^\d+ \d+( [TF])?$', line):
        matched_data.append(line)
        
print(matched_data)
# ['2 8', '2 9 T', '4 9', '1 10', '2 10 F', '7 11 T']
ars1skjm

ars1skjm2#

我们可以使用re.findall()来获取整个文件中的所有示例。

import re

regexp = r"^\d[\d ]*[T|F]?"

with open("file.txt", "r") as fp:
    # Not suggested if the file is large.
    data = fp.read()
    print(re.findall(regexp, data, re.M))

输出:

['2 8 ', '2 9 T', '4 9', '1 10 ', '2 10 F', '7 11 T']

对于大文件,最好逐行迭代。

data = []
with open(file, 'r') as fp:
    for line in fp:
        _match = re.match(regexp, line)
        if _match:
            data.append(_match.group())

如果您有兴趣了解有关正则表达式的详细信息,请访问regexone

csga3l58

csga3l583#

您还可以将匹配的行解析为(int, int, boolean |None)的元组:

import re

with open("file.txt", "r") as file:
    result = [
        (int(a), int(b), flag == "T" if flag else None)
        for a, b, flag in re.findall(r"^(\d+)[ ]+(\d+)(?:[ ]+([TF]))?[ ]*$", 
                                     file.read(), re.M)
    ]

print(result)

示例文件的输出:

[(2, 8, None), (2, 9, True), (4, 9, None), (1, 10, None), (2, 10, False), (7, 11, True)]

相关问题