regex 提取列,跳过文件中的某些行进行数据处理

hfwmuf9z  于 2023-06-07  发布在  其他
关注(0)|答案(2)|浏览(191)

我尝试使用test.py脚本处理input.txt,以提取预期输出中所示的特定信息。我已经得到了基本的存根,但是正则表达式显然没有提取我所期望的特定列的详细信息。我已经显示了预期的输出供您参考。
一般来说,我正在寻找一个[XXXYY] {TAG}模式,一旦找到该模式,如果下一列以J开始,则提取列1,列2和(前3个字符)列3。我还想知道如何删除[00033] GND(和[00272] POS_3V3)之后的某些行,直到看到下一个[XXXYY] {TAG}模式。我被限制使用python 2.7.5,re和csv库,不能使用pandas。

input.txt
<<< Test List >>>
Mounting Hole                   MH1            APBC_MH_3.2x7cm
Mounting Hole                   MH2            APBC_MH_3.2x7cm
Mounting Hole                   MH3            APBC_MH_3.2x7cm
Mounting Hole                   MH4            APBC_MH_3.2x7cm

[00001] DEBUG_SCAR_RX
        J1         B30     PIO37          PASSIVE     TRA6-70-01.7-R-4-7-F-UG
        R2         2       2              PASSIVE     4.7kR

[00002] DEBUG_SCAR_TX
        J1         B29     PIO36          PASSIVE     TRA6-70-01.7-R-4-7-F-UG

[00003] DYOR_DAT_0
        J2         B12     APB10_CC_P     PASSIVE     TRA6-70-01.7-R-4-7-F-UG

[00033] GND
        DP1        5       5              PASSIVE     MECH, DIP_SWITCH, FFFN-04F-V
        DP1        6       6              PASSIVE     MECH, DIP_SWITCH, FFFN-04F-V
        DP1        7       7              PASSIVE     MECH, DIP_SWITCH, FFFN-04F-V

[00271] POS_3.3V_INH
        Q2         3       DRAIN          PASSIVE     2N7002
        R34        2       2              PASSIVE     4.7kR

[00272] POS_3V3
        J1         B13     FETO_FAT       PASSIVE     TRA6-70-01.7-R-4-7-F-UG
        J1         B14     FETO_FAT       PASSIVE     TRA6-70-01.7-R-4-7-F-UG
        J2         B59     FETO_HDB       PASSIVE     TRA6-70-01.7-R-4-7-F-UG
test.py
import re

# Read the input file
with open('input.txt', 'r') as file:
    content = file.readlines()

# Process the data and extract the required information
result = []
component_name = ""
for line in content:
    line = line.strip()
    if line.startswith("["):
        s = re.sub(r"([\[0-9]+\]) (\w+)$", r"\2", line)
    elif line.startswith("J"):
        sp = re.sub(r"^(\w+)\s+(\w+)\s+(\w+)", r"\1\2", line)
        print("%s\t%s" % (s, sp))
输出
DEBUG_SCAR_RX   J1B30          PASSIVE     TRA6-70-01.7-R-4-7-F-UG
DEBUG_SCAR_TX   J1B29          PASSIVE     TRA6-70-01.7-R-4-7-F-UG
DYOR_DAT_0  J2B12     PASSIVE     TRA6-70-01.7-R-4-7-F-UG
POS_3V3 J1B13       PASSIVE     TRA6-70-01.7-R-4-7-F-UG
POS_3V3 J1B14       PASSIVE     TRA6-70-01.7-R-4-7-F-UG
POS_3V3 J2B59       PASSIVE     TRA6-70-01.7-R-4-7-F-UG
应为
DEBUG_SCAR_RX   J1 B30 PIO
DEBUG_SCAR_TX   J1 B29 PIO
DYOR_DAT_0  J2 B12 APB
zengzsys

zengzsys1#

也许你可以用途:

import re

TAGS = ['DEBUG_SCAR_RX', 'DEBUG_SCAR_TX', 'DYOR_DAT_0']

data = []
with open('input.txt') as file:
    for row in file:
        row = row.strip()       
        if row.startswith('['):
            tag = row.split(']')[1].strip()
        elif row == '':
            continue
        else:
            cols = re.split('\s+', row)
            if cols[0].startswith('J') and tag in TAGS:
                data.append([tag, cols[0], cols[1], cols[2][:3]])

输出:

# '2.7.18 (default, Jan 23 2023, 08:22:06) \n[GCC 12.2.0]'
>>> data
[['DEBUG_SCAR_RX', 'J1', 'B30', 'PIO'],
 ['DEBUG_SCAR_TX', 'J1', 'B29', 'PIO'],
 ['DYOR_DAT_0', 'J2', 'B12', 'APB']]
vzgqcmou

vzgqcmou2#

你真的不需要 re 为这么微不足道的事情。
只需一次读取输入文件的一行。检查一行是否以左括号开始。如果是,保存键值。阅读下一行并绘制标记。检查第一个标记的第一个字符是否为“J”。根据需要打印数据:

with open('/Volumes/G-Drive/input.txt') as data:
    for line in data:
        if line.startswith('['):
            k = line.split()[-1]
            dl = next(data).split()
            if len(dl) > 2 and dl[0][0] == 'J':
                print(k, dl[0], dl[1], dl[2][:3])

输出:

DEBUG_SCAR_RX J1 B30 PIO
DEBUG_SCAR_TX J1 B29 PIO
DYOR_DAT_0 J2 B12 APB
POS_3V3 J1 B13 FET

相关问题