Python读取文件，搜索正则表达式并将捕获组放入数组中，以读取另一个文件并比较字符串是否存在

fhg3lkii 于 2022-12-10 发布在 Python

关注(0)|答案(1)|浏览(101)

我有一个包含多行事件的日志文件，其中包含我需要捕获的元素，然后在日志文件中递归搜索文件中的字符串并写入csv。目前我正在使用多个bash命令进行此操作，虽然它可以工作，但很难看。错误日志文件可以包含成千上万行，其中包含数百个严重错误。
日志文件（error.log）

INFO ..some text.. title: (capture this title in capture group - title1)
    INFO ..some text.. path: (capture this url in capture group - url1)
    INFO ..some text..
    INFO ..some text.. version: (capture version in capture group - version1)
    INFO ..some text..
    INFO ..some text..
CRITICAL ..some text.. file/path (capture path (not file) in capture group - fp1) reason (capture reason in capture group - reason1)

递归搜索以 *.foo123结尾的文件，查找捕获组文件/路径的任何匹配项。从递归搜索的文件路径获取元素。/some/path/(capture this - fp2)/(capture this - fp3)/(capture filename.foo123 - fname)如果fp 1存在于任何 *.foo123文件中，则打印为csv格式fp 2，fp 3，fname，title 1，version 1，reason 1，url 1
完全noob，所以请温柔一点。我的谷歌foo试图munge的东西在一起是一个完全失败的
我将fp 1写入unsupported.txt（grepping w/regex error.log），每个值在单独的行上

import os
ba = open('unsupported.txt', 'r')
ba1 = ba.readlines()

for folder, dirs, files in os.walk(rootdir):
    for file in files:
        if file.endswith('.foo123'):
            fullpath = os.path.join(folder, file)
            with open(fullpath, 'r') as f:
                for line in f:
                    if any(ext in ba1 for ext in line):
                        print(line)

这不会返回任何结果。看起来ba 1被捕获为一个数组。如果我将if any(ext in ba1 for ext in line):更改为实际值if any(ext in "bad_value" for ext in line):，则会打印出所有与“bad_value”匹配的文件的内容。如果我不能做到这一点，我肯定无法完成任何我想完成的事情。
我已经尝试了各种其他选择，从例子，我已经看到搜索时，只是没有得到我需要的地方。
作为奖励，为我指出一些阅读材料的任务，我试图完成将是很好的。

python

来源：https://stackoverflow.com/questions/74746097/python-to-read-file-search-regex-put-capture-groups-in-array-to-read-another

1条答案

按热度按时间

rnmwe5a21#

如果您还没有安装调试器，最好先安装调试器。
你的问题陈述有点不清楚，但我会把它分解成几个步骤。你是按你想要的方式循环文件吗？你想对每一行做什么？你提到想用regex做一些检查，this tool是一个很有用的地方。你还提到想向csv添加一些数据。Numpy是一个很有用的functionality to write to csv库

赞(0）回复(0）举报 2022-12-10

我来回答

Python读取文件，搜索正则表达式并将捕获组放入数组中，以读取另一个文件并比较字符串是否存在

1条答案

相关问题

热门标签

最新问答