regex 一组模式的通用/统一正则表达式

iszxjhcz 于 2023-06-07 发布在其他

关注(0)|答案(2)|浏览(83)

我正试图做一些文本处理，并有兴趣知道我是否可以有一个共同的/统一的正则表达式为某一模式。我们感兴趣的模式是以{string}_{i}结尾的字符串，其中i是一个数字，位于test.csv的第二列。一旦匹配了正则表达式，我希望用{string}[i]替换它。
现在，对于我明确提到的正则表达式模式的字符串，python脚本可以按预期工作。我希望有一个更通用的正则表达式模式，它将匹配所有具有{string}_{i}的字符串，而不是为所有模式编写正则表达式（这是不可扩展的）。

input test.csv

bom_a14 , COMP_NUM_0
bom_a17 , COMP_NUM_2
bom_a27 , COMP_NUM_11
bom_a35 , FUNC_1V8_OLED_OUT_7
bom_a38 , FUNC_1V8_OLED_OUT_9
bom_a39 , FUNC_1V8_OLED_OUT_10
bom_a46 , CAP_4
bom_a47 , CAP_3
bom_a48 , CAP_6

test.py

import csv
import re

# Match the values in the first column of the second file with the first file's data
with open('test.csv', 'r') as file2:
    reader = csv.reader(file2)
    for row in reader:
        row_1=row[1]
        # for matching COMP_NUM_{X}
        match_data = re.match(r'([A-Z]+)_([A-Z]+)_(\d+)',row_1.strip())
        # for matching FUNC_1V8_OLED_OUT_{X}
        match_data2 = re.match(r'([A-Z]+)_([A-Z0-9]+)_([A-Z]+)_([A-Z]+)_(\d+)',row_1.strip())
        # if match found, reformat the data
        if match_data:
            new_row_1 = match_data.group(1) +'_'+ match_data.group(2)+ '[' + match_data.group(3) + ']'
        elif match_data2:
            new_row_1 = match_data2.group(1) +'_'+ match_data2.group(2)+ '_'+ match_data2.group(3)+'_'+ match_data2.group(4)+'[' + match_data2.group(5) + ']'
        else:
            new_row_1 = row_1
        print new_row_1

输出

COMP_NUM[0]
COMP_NUM[2]
COMP_NUM[11]
FUNC_1V8_OLED_OUT[7]
FUNC_1V8_OLED_OUT[9]
FUNC_1V8_OLED_OUT[10]
 CAP_4
 CAP_3
 CAP_6

预期输出

COMP_NUM[0]
COMP_NUM[2]
COMP_NUM[11]
FUNC_1V8_OLED_OUT[7]
FUNC_1V8_OLED_OUT[9]
FUNC_1V8_OLED_OUT[10]
CAP[4]
CAP[3]
CAP[6]

regex

来源：https://stackoverflow.com/questions/76389341/common-unified-regex-for-a-set-of-pattern

2条答案

按热度按时间

wsxa1bj11#

我会使用sub和一个通用的 * 模式 *：

with open("test.csv", "r") as file2:
    for row in csv.reader(file2):

        s = re.sub(r"(.+)_(\d+)$", r"\1[\2]", row[-1].strip())

        print(s)

Regex：[ demo ]
输出：

COMP_NUM[0]
COMP_NUM[2]
COMP_NUM[11]
FUNC_1V8_OLED_OUT[7]
FUNC_1V8_OLED_OUT[9]
FUNC_1V8_OLED_OUT[10]
CAP[4]
CAP[3]
CAP[6]

赞(0）回复(0）举报 2023-06-07

pgx2nnw82#

如果使用re.search，正则表达式不必完全匹配字符串，而只需匹配子部分。更值得注意的是，你甚至不需要使用csv阅读器来实现你想要的。

import re

data="""bom_a14 , COMP_NUM_0
bom_a17 , COMP_NUM_2
bom_a27 , COMP_NUM_11
bom_a35 , FUNC_1V8_OLED_OUT_7
bom_a38 , FUNC_1V8_OLED_OUT_9
bom_a39 , FUNC_1V8_OLED_OUT_10
bom_a46 , CAP_4
bom_a47 , CAP_3
bom_a48 , CAP_6"""

for line in data.split('\n'):
    match_data = re.search(r'(\w+)_(\d+)',line)
    if match_data:
        g1,g2=match_data.groups()
        print(f"{g1}[{g2}]")

赞(0）回复(0）举报 2023-06-07

我来回答

regex 一组模式的通用/统一正则表达式

input test.csv

test.py

输出

预期输出

2条答案

相关问题

热门标签

最新问答