regex 正则表达式转换可以存储在字典中吗？

我有数千行文本可以应用正则表达式方法。我分两步做，但我不认为我的第二步是可能的。

步骤1：从字典中选择正确的正则表达式模式
步骤2：应用正确的转换从字典创建新的输出

我试图确认没有办法将转换指令存储在字典中。
我可以通过为每种类型提供一个解码函数来不同地解决这个问题，然后有一个解码函数的字典，但我很想知道是否有一种方法可以像我第一次尝试的那样（如下）。

text_lines = """
D3/25' 9
U-106.00
T-106.00
CX
PPlayHouse
LSchooling:School Fees
SSchooling:School Fees
EWeek 14
$-38.00
SSchooling:School Fees
EWeek 15
$-19.00
SSchooling:School Fees
EWeek 16
$-38.00
SSchooling:School Fees
$-11.00
"""
dict_regex_patterns = {
    'AccountDefinitions': {
        "D": r'(D)(\d{1,2})/\s*(\d{1,2})\'\s*(\d{1,2})$',
        "S": r'(S)(.+)', 
        "E": r'(E)(.+)', 
        "$": r'($)(.+)',
        ...etc
        },
    'CatDefinitions': {
        "T": r'(T)([+-]?\d+(\.\d{1,2})?)',
        "U": r'(U)(.+)', 
        "C": r'(C)(.+)', 
        "P": r'(P)(.+)', 
        "L": r'(L)(.+)',        },
        ...etc
}

# ------------------this wont work--------------------
dict_decoders = {
      'AccountDefinitions': {
        "D" : '20'+"{:02d}".format(int(source.group(4))) +f"-{source.group(2)}-{source.group(3)}",
        ...etc
        }
}
# ------------------this wont work--------------------

....

current_column_name = None
lines = text_lines.splitlines()
for line in lines:
    # If the line is one character long, consider it as part of the previous line
    if len(line) >= 1:
      current_column_name = line[0]
    regXpatterns = dict_regex[fragmentType]
      try:
        matcher = re.match(regXpatterns[current_column_name], line)
        decoder = re.match(regXpatterns[current_column_name], line)
      except:
        print('Error: No regex pattern defined for field value type ', current_column_name)

      print (matcher)

    if decoded:
        answer = '20'+"{:02d}".format(int(match_date.group(4))) +f"-{match_date.group(2)}-{match_date.group(3)}"
         print (line, ' <-> ', current_transaction['Date'])

这是一个奇怪的小文件格式，称为“QIF”（https://en.wikipedia.org/wiki/Quicken_Interchange_Format），我想写我自己的脚本。输入QIF，输出CSV。
出于多种原因，尤其是这种文件格式的特殊实现，我想创建一个函数，它将几行文本转换为CSV风格的记录集。
输入-如上所述，所需输出-如下所示：
| D| U|不|C| P| L| S| E|美元/美元|
| --|--|--|--|--|--|--|--|--|
| 3/25' 9| -106.00 | -106.00 |X| Playhouse|教育：学费|教育：学费|第14周| -38.00 |
| 3/25' 9| -106.00 | -106.00 |X| Playhouse||教育：学费|EWeek 15| -19.00 |
| 3/25' 9| -106.00 | -106.00 |X| Playhouse||教育：学费|第16周| -38.00 |
| 3/25' 9| -106.00 | -106.00 |X| Playhouse||教育：学费|| -11.00 |
编辑#2 -使用Quiffen
产生以下内容：

C:\Users\Maxcot>python
Python 3.11.3 (tags/v3.11.3:f3909b8, Apr  4 2023, 23:49:59) [MSC v.1934 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from quiffen import Qif, QifDataType
>>> import os
>>> import decimal
>>> folder = r'C:\Users\Maxcot\Desktop\Files'
>>> sourcefile = os.path.join(folder,'Exported.qif')
>>> qif = Qif.parse(sourcefile, day_first=False)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python311\Lib\site-packages\quiffen\core\qif.py", line 195, in parse
    new_category = Category.from_list(sanitised_section_lines)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\quiffen\core\category.py", line 466, in from_list
    raise ValueError(f"Unknown line code: {line_code}")
ValueError: Unknown line code: t
>>>```

我发现什么是.qif文件，但你的例子显示了一个字符串。
“我想写我自己的日记。输入QIF输出CSV*
您可能需要像quiffen这样的解析器，而不是正则表达式方法。
或者，如果你想使用pandas，你可以做一个经典的pivot：

#pip install pandas
import pandas as pd
from io import StringIO

df = (
    (tmp:=pd.read_csv(
        StringIO(text_lines), header=None, sep=r"(?<=^[A-Z$])", engine="python"))
        .assign(transaction=lambda x: x.groupby(0).cumcount().add(1))
        .pivot(index="transaction", columns=0, values=1).pipe(
            lambda x: x.assign(**{c: x[c].ffill() for c in list("DUTCP")})) # op ?
        .rename_axis(columns=None)[tmp[0].unique()]
        # .fillna("") uncomment if needed
)

# df.to_csv("qif.csv", sep=",", index=True) # uncomment to make a `.csv`

输出量：

print(df)

                   D        U        T  C          P                      L                      S        E       $
transaction                                                                                                        
1            3/25' 9  -106.00  -106.00  X  PlayHouse  Schooling:School Fees  Schooling:School Fees  Week 14  -38.00
2            3/25' 9  -106.00  -106.00  X  PlayHouse                    NaN  Schooling:School Fees  Week 15  -19.00
3            3/25' 9  -106.00  -106.00  X  PlayHouse                    NaN  Schooling:School Fees  Week 16  -38.00
4            3/25' 9  -106.00  -106.00  X  PlayHouse                    NaN  Schooling:School Fees      NaN  -11.00

regex 正则表达式转换可以存储在字典中吗？

1条答案

相关问题

热门标签

最新问答