regex 在第一个正则表达式行匹配中保留第二个正则表达式行

lvjbypge 于 11个月前发布在其他

关注(0)|答案(3)|浏览(91)

我在E:\Desktop\Linux_distro\asliiiii目录中有大量的txt列表文件，下面是我的一个文件的示例：

95
ROSA
139
96
Chakra
137
97
AV Linux
135
98
LibreELEC
134
99
Simplicity
131
100
Kodachi
130
20200301020449
79776361952441

字符串
现在我需要一个脚本，首先找到\d{14}正则表达式行，然后在找到的行中只保留20(?:0[0-9]|1[0-9]|20)[0-1][0-9]正则表达式行。
这意味着以下结果必须提供给我：

95
ROSA
139
96
Chakra
137
97
AV Linux
135
98
LibreELEC
134
99
Simplicity
131
100
Kodachi
130
20200301020449

型
我写了下面的python脚本，但我不知道为什么它不适合我的列表！

import os
import re

def process_file(file_path):
    with open(file_path, 'r') as file:
        lines = file.readlines()

    # Find lines matching \d{14}
    regex_pattern_1 = re.compile(r'\d{14}')
    matching_lines = [line.strip() for line in lines if regex_pattern_1.search(line)]

    # Keep only matches of the second regex in the found lines
    regex_pattern_2 = re.compile(r'20(?:0[0-9]|1[0-9]|20)[0-1][0-9]\d{8}')
    filtered_lines = []
    for line in matching_lines:
        matches = regex_pattern_2.findall(line)
        filtered_lines.extend(matches)

    # Write the filtered lines back to the file
    with open(file_path, 'w') as file:
        file.write('\n'.join(filtered_lines))

def process_files_in_directory(directory_path):
    for filename in os.listdir(directory_path):
        if filename.endswith('.txt'):
            file_path = os.path.join(directory_path, filename)
            process_file(file_path)

if __name__ == "__main__":
    directory_path = r'E:\Desktop\Linux_distro\asliiiii'
    process_files_in_directory(directory_path)
    print("Processing complete.")

型
但这个脚本提供了我以下的结果！！

20200301020449

型
这个脚本的问题在哪里？

regex

来源：https://stackoverflow.com/questions/77467788/keep-second-regex-lines-in-first-regex-lines-matches

3条答案

按热度按时间

wfypjpf41#

尝试以下操作。

matches = regex_pattern_2.findall(line[:6])

字符串
或者，调整 * 模式 * 以包含剩余的 *8**字符 *。

20(?:0[0-9]|1[0-9]|20)[0-1][0-9]\d{8}

型

赞(0）回复(0）举报 10个月前

2skhul332#

我的意思是，太多的人使用正则表达式来解决实际上并不需要它们的问题。

def process_file(fn):
    fin = open(fn)
    fout = open(fn+'.out','w')

    for line in fin:
        line = line.strip()
        print(line, file=fout)
        if len(line) == 14 and line.isdigit():
            break

    for line in fin:
        line = line.strip()
        if len(line) == 14 and line.isdigit() and line.startswith('20'):
            print(line, file=out)

process_file('x.txt')

字符串
现在，我假设检查“以'20'开头的14位数字”足以找到您的时间戳，但是如果您真的需要找到有效的日期，您可以在这里使用正则表达式。
请注意，我复制到一个新的文件与一个特殊的名称。你可以做一个delete和rename在最后，如果你想。

赞(0）回复(0）举报 10个月前

mutmk8jj3#

以下脚本对我很好：

import os
import re

def process_file(file_path):
    with open(file_path, 'r') as file:
        lines = file.readlines()

    # Keep lines that match the second regex or do not match any regex
    regex_pattern_2 = re.compile(r'20(?:0[0-9]|1[0-9]|20)[0-1][0-9]\d{8}')
    filtered_lines = [line.strip() for line in lines if regex_pattern_2.search(line) or not re.search(r'\d{14}', line)]

    # Write the filtered lines back to the file
    with open(file_path, 'w') as file:
        file.write('\n'.join(filtered_lines))

def process_files_in_directory(directory_path):
    for filename in os.listdir(directory_path):
        if filename.endswith('.txt'):
            file_path = os.path.join(directory_path, filename)
            process_file(file_path)

if __name__ == "__main__":
    directory_path = r'E:\Desktop\Linux_distro\asliiiii'
    process_files_in_directory(directory_path)
    print("Processing complete.")

字符串

赞(0）回复(0）举报 10个月前

我来回答

regex 在第一个正则表达式行匹配中保留第二个正则表达式行

3条答案

相关问题

热门标签

最新问答