regex 使用带有RE.VERBOSE的正则表达式提取带有可选组的文本

bd1hkmkf 于 2023-08-08 发布在其他

关注(0)|答案(1)|浏览(75)

我正在用RE.VERBOSE从HTM文本中提取一些信息。因为HKDMOPrate并不总是出现在代码中，所以我将其作为一个可选组。然而，代码并没有像预期的那样工作。下面是我的代码：

def get_primerate_result(text):
    pattern="""
        (effect\sfrom\s)
        (?P<Date>[A-Za-z0-9\,\s]+)
        (\s\()
        (.*USD\sprime\srate)
        (.*to\s)
        (?P<USDrate>[0-9\.\%]+)
        (\sp\.a\.)
        ((.*HKD\sand\sMOP\sprime\srate)
        (.*to\s)
        (?P<HKDMOPrate>[0-9\.\%]+)
        (\sp\.a\.))?
    """

    dict_result=[i.groupdict() for i in re.finditer(pattern, text, re.VERBOSE)]
    return dict_result

字符串
以下是两个示例输入：

输入：
正文1：

'Dear Customers,\nWith the Federal Reserve System raising its federal funds rate by 0.25%, our bank is here to announce that with effect from July 28, 2023 (Friday), our USD prime rate will be increased from 8.25% p.a. to 8.50% p.a., our HKD and MOP prime rate will be increased from 6.00% p.a. to 6.125% p.a.\nBank of China Limited Macau Branch\nBank of China (Macau) Limited\nJuly 27, 2023\nPlease click to check：\n\nPrime Rate\n'

型

正文二：

'Dear Customers,\nWith the Federal Reserve System raising its federal funds rate by 0.25%, our bank is here to announce that with effect from March 24, 2023 (Friday), our USD prime rate will be increased from 7.75% p.a. to 8.00% p.a.\nBank of China Limited Macau Branch\nBank of China (Macau) Limited\nMarch 24, 2023\nPlease click to check\n\nPrime Rate\n'

型

以下是我想要的输出：

result 1: [{'Date': 'July 28, 2023', 'USDrate': '8.50%', 'HKDMOPrate': '6.125%'}]
result 2: [{'Date': 'March 24, 2023', 'USDrate': '8.00%', 'HKDMOPrate': None}]

型

实际产量

result 1: [{'Date': 'July 28, 2023', 'USDrate': '6.125%', 'HKDMOPrate': None}]
result 2: [{'Date': 'March 24, 2023', 'USDrate': '8.00%', 'HKDMOPrate': None}]

型

regex

来源：https://stackoverflow.com/questions/76833572/extract-text-with-optional-group-using-regex-with-re-verbose