regex 用python正则表达式匹配可选全字的最佳方法是什么

3zwtqj6y  于 2023-03-04  发布在  Python
关注(0)|答案(1)|浏览(93)

我经常使用regualr表达式,但通常是以同样相似的方式。我有时会遇到这样的情况,我想捕捉字符串与可选的整个单词在他们。我想出了下面的方法,但我怀疑有一个更好的方法,只是不知道它是什么?一个例子是这样的字符串:
For the purposes of this order, the sum of $5,476,958.00 is the estimated total costs of the initial unit well covered hereby as dry hole and for the purposes of this order, the sum of $12,948,821.00 is the estimated total costs of such initial unit well as a producing well
我的目标是捕获字符串中以美元符号$开头、以单词dryprod结尾的两个部分。在示例中,整个单词是producing,但有时它是单词的变体,例如production,因此prod是合适的。捕获的结果应该是:
['$5,476,958.00 is the estimated total costs of the initial unit well covered hereby as dry', '$12,948,821.00 is the estimated total costs of such initial unit well as a prod']
我用一个不太优雅的表达方式来表达
[val[0] for val in re.findall('(\$[0-9,\.]+[a-z ,]+total cost.*?(dry|prod)+)', line, flags=re.IGNORECASE)]
有没有比这更好、更正确的方法来实现它呢?

bvhaajcl

bvhaajcl1#

我们可以在这里使用re.findall

inp = "For the purposes of this order, the sum of $5,476,958.00 is the estimated total costs of the initial unit well covered hereby as dry hole and for the purposes of this order, the sum of $12,948,821.00 is the estimated total costs of such initial unit well as a producing well"
matches = re.findall(r'\$\d{1,3}(?:,\d{3})*(?:\.\d+)?.*?\b(?:dry|prod)', inp)
print(matches)

这将打印:

['$5,476,958.00 is the estimated total costs of the initial unit well covered hereby as dry',
 '$12,948,821.00 is the estimated total costs of such initial unit well as a prod']

下面是对所使用的正则表达式模式的解释:

  • \$匹配货币符号$
  • \d{1,3}匹配1到3位数字
  • (?:,\d{3})*后跟可选的千位项
  • (?:\.\d+)?后跟可选的小数部分
  • .*?匹配所有内容,直到达到最近
  • \b(?:dry|prod)dryprod匹配为子字符串

相关问题