regex 无法使用正则表达式创建正确的模式以从字符串中提取所需部分

brccelvz  于 2023-10-22  发布在  其他
关注(0)|答案(2)|浏览(113)

如果我将regex应用于以下字符串,正确的模式是什么?

item = "JavaScript:muestra('01','043','071','01','600%2D2023%2DSUNARP%2DTR')"

我希望得到的结果是

0104307101600-2023-SUNARP-TR

我试过:

import re

item = "JavaScript:muestra('01','043','071','01','600%2D2023%2DSUNARP%2DTR')"
content = re.findall(r"(\('.+?'\))",item)[0].replace("'","").replace(",","")
print(content)
lxkprmvk

lxkprmvk1#

试试看:

>>> import re
>>> item = "JavaScript:muestra('01','043','071','01','600%2D2023%2DSUNARP%2DTR')"
>>> chunks = re.findall(r"'([^']+)'", item)
>>> chunks
['01', '043', '071', '01', '600%2D2023%2DSUNARP%2DTR']
>>> chunks[-1]
'600%2D2023%2DSUNARP%2DTR'

如果这感觉太“松散”,你可以先用类似这样的东西抓住括号区域:

paren = re.search(r"\(([^)]+)\)", item)

或者,更准确地说:

paren = re.search(r"JavaScript:muestra\(([^)]+)\)", item)

然后在上面的代码片段中运行基于'的匹配。
使用re.split()也可以工作:

[x for x in re.split(r"'(?:,')?", paren[1]) if x]

另一种方法是ast.literal_eval将带括号的子字符串转换为元组:

>>> import ast
>>> paren = re.search(r"\([^)]+\)", item)
>>> ast.literal_eval(paren[0])
('01', '043', '071', '01', '600%2D2023%2DSUNARP%2DTR')
>>> tup = ast.literal_eval(paren[0])
>>> tup[-1]
'600%2D2023%2DSUNARP%2DTR'

把它归结为一行代码,没有正则表达式:

>>> ast.literal_eval(item.replace("JavaScript:muestra", ""))[-1]
'600%2D2023%2DSUNARP%2DTR'
jckbn6z7

jckbn6z72#

from urllib.parse import unquote

item = "JavaScript:muestra('01','043','071','01','600%2D2023%2DSUNARP%2DTR')"

data = ''.join(
    unquote(item).split('(')[-1].strip('()').replace('\'','').split(',')
)
print(data) # 0104307101600-2023-SUNARP-TR

相关问题