regex 使用Python正则表达式过滤带有浮点值的模式[重复]

igsr9ssn 于 2023-10-22 发布在 Python

关注(0)|答案(4)|浏览(127)

这个问题已经有答案了：

Python - regex extract numbers from text that may contain thousands or millions separators and convert them to dot separated decimal floats（1个答案）
18天前关闭
我使用Python 3.8和正则表达式来过滤带有数值的字符串。如何使用 re.search 过滤字符串中的十进制值？我尝试了[\d.]模式，但这并不能帮助获取更大的数字。

s = 'e (s):    16,154.5    2,308,282.6'
s1 = re.search(r'e (s):(\s+)[\d.]+(\s+)[\d.]+',s)
print(s1.group())

输出

AttributeError: 'NoneType' object has no attribute 'group'

regex

来源：https://stackoverflow.com/questions/77219089/filter-pattern-with-float-values-with-python-regex

4条答案

按热度按时间

zpqajqem1#

尝试以下模式，使用 *re.finditer**方法 *。

\d+(?:,\d+)*\.\d+

\d+匹配数字 0 - 9，一个或多个
(?:,\d+)*匹配逗号后跟\d+、零或任意
并且，\.\d+匹配小数点后的\d+

这里有一个例子。

s = 'e (s): 16,154.5    2,308,282.6'
[print(x.group()) for x in re.finditer(r'\d+(?:,\d+)*\.\d+', s)]

输出

16,154.5
2,308,282.6

赞(0）回复(0）举报 2023-10-22

2eafrhcq2#

我建议使用re.findall来扫描所有这些值。

>>> s = 'e (s): 16,154.5    2,308,282.6'
>>> re.findall(r'([\d,]+(?:\.\d+)?)', s)
['16,154.5', '2,308,282.6']

>>> s = 'e (s): 16,154.5    2,308,282.6,67.8'
>>> re.findall(r'\b([\d,]+(?:\.\d+)?)\b', s)
['16,154.5', '2,308,282.6', ',67.8']

哦。我们知道这个模式将以一个数字开始。

>>> s = 'e (s): 16,154.5    2,308,282.6,67.8,89'
>>> re.findall(r'\b(\d(?:[\d,]+(?:\.\d+)?)?)\b', s)
['16,154.5', '2,308,282.6', '67.8', '89']

或者我们可以简化一点，使[\d,]在初始数字之后成为一个出现零次或多次的模式，小数位和尾随数字是可选的：

>>> re.findall(r'\b(\d[\d,]*(?:\.\d+)?)\b', s)
['16,154.5', '2,308,282.6', '67.8', '89']

我们可以进一步完善这个。

>>> s = '  67,45,1,2.3,78,890,345.2'
>>> re.findall(r'\b(\d{1,3}(?:,\d{3})*(?:\.\d+)?)\b', s)
['67', '45', '1', '2.3', '78,890,345.2']

在这里，数字是一个到三个数字，后面是一个逗号和三个以上的数字重复零次或更多次，然后可选地是一个小数点和至少一个数字。数字必须以字边界（\b）为界。
在这一点上，获得所有这些作为浮点数是简单的。

>>> [float(re.sub(',', '', x))
...  for x in re.findall(r'\b(\d{1,3}(?:,\d{3})*(?:\.\d+)?)\b', s)]
[67.0, 45.0, 1.0, 2.3, 78890345.2]

当然，还有一件事我们知道，这个模式将从零以外的一个数字开始开始。这是一个很容易的修改。

r'\b([1-9]\d{0,2}(?:,\d{3})*(?:\.\d+)?)\b'

赞(0）回复(0）举报 2023-10-22

euoag5mw3#

你可以匹配正则表达式：

^e \(s\): *(\d{1,3}(?:,\d{3})*\.\d+) *(\d{1,3}(?:,\d{3})*\.\d+) *$

如果存在匹配，则捕获组1和2将保存感兴趣的字符串。
Demo
如果使用Python的替代PyPI正则表达式引擎，正则表达式可以通过使用子表达式来简化：

^e \(s\): *(\d{1,3}(?:,\d{3})*\.\d+) *(?1) *$

这里?1匹配第一个捕获组中定义的表达式（即(\d{1,3}(?:,\d{3})*\.\d+)）。
Demo

赞(0）回复(0）举报 2023-10-22

44u64gxh4#

我的建议

import re

s = 'e (s): 16,154.5    2,308,282.6'
s1 = re.search(r'e \(s\):[\s\d,\.]+', s)

if s1:
    print(s1.group())
else:
    print("No match found")

赞(0）回复(0）举报 2023-10-22

我来回答

regex 使用Python正则表达式过滤带有浮点值的模式[重复]

输出

4条答案

相关问题

热门标签

最新问答