我正在尝试处理CSV格式的字符串中不匹配的双引号。
准确地说,
"It "does "not "make "sense", Well, "Does "it"
应更正为
"It" "does" "not" "make" "sense", Well, "Does" "it"
所以基本上我要做的就是
替换所有"""
1.前面没有行首或逗号(and)
1.后面不跟逗号或行尾
带"""
为此,我使用下面的正则表达式
(?<!^|,)"(?!,|$)
问题是Ruby正则表达式引擎(http://www.rubular.com/)能够解析正则表达式,而python正则表达式引擎(https://pythex.org/、http://www.pyregex.com/)抛出以下错误
Invalid regular expression: look-behind requires fixed-width pattern
在python 2.7.3中,它会抛出
sre_constants.error: look-behind requires fixed-width pattern
有人能告诉我是什么让python烦恼吗?
- 编辑:**
按照Tim的响应,我得到了下面的多行字符串输出
>>> str = """ "It "does "not "make "sense", Well, "Does "it"
... "It "does "not "make "sense", Well, "Does "it"
... "It "does "not "make "sense", Well, "Does "it"
... "It "does "not "make "sense", Well, "Does "it" """
>>> re.sub(r'\b\s*"(?!,|$)', '" "', str)
' "It" "does" "not" "make" "sense", Well, "Does" "it" "\n"It" "does" "not" "make" "sense", Well, "Does" "it" "\n"It" "does" "not" "make" "sense", Well, "Does" "it" "\n"It" "does" "not" "make" "sense", Well, "Does" "it" " '
在每一行的末尾,在"it"旁边加上了两个双引号。
所以我对正则表达式做了一个很小的改动来处理一个新行。
re.sub(r'\b\s*"(?!,|$)', '" "', str,flags=re.MULTILINE)
但这给出了
>>> re.sub(r'\b\s*"(?!,|$)', '" "', str,flags=re.MULTILINE)
' "It" "does" "not" "make" "sense", Well, "Does" "it"\n... "It" "does" "not" "make" "sense", Well, "Does" "it"\n... "It" "does" "not" "make" "sense", Well, "Does" "it"\n... "It" "does" "not" "make" "sense", Well, "Does" "it" " '
最后一个"it"有两个双引号。
但是我想知道为什么"$"行结束字符不会标识行已经结束。
最终的答案是
re.sub(r'\b\s*"(?!,|[ \t]*$)', '" "', str,flags=re.MULTILINE)
3条答案
按热度按时间58wvjzkj1#
Python
re
lookbehind确实需要固定宽度,当你在lookbehind模式中有不同长度的替换时,有几种方法可以处理这种情况:(?<=[^,])"(?!,|$)
,该模式要求在双引号前使用字符而不是逗号,或者使用常见模式来匹配用空格括起来的单词,(?<=\s|^)\w+(?=\s|$)
可以写成(?<!\S)\w+(?!\S)
),或者(?<=a|bc)
应重写为(?:(?<=a)|(?<=bc))
)(?<=\s|^)
匹配空白或字符串/行的开头(如果使用了re.M
),所以在Pythonre
中,使用(?<!\S)
,(?<=^|;)
将被转换为(?<![^;])
,如果你还想确保行首匹配,将\n
添加到求反的字符类中,例如(?<![^;\n])
(参见Python Regex: Match start of line, or semi-colon, or start of string, none capturing group)。注意,这对于(?<!\S)
来说是不必要的,因为\S
不匹配换行符。(?<!^|,)"(?!,|$)
应该看起来像(?<!^)(?<!,)"(?!,|$)
)。或者,只需使用
pip install regex
(或pip3 install regex
)安装PyPi regex module,即可享受无限宽度的lookbehind。31moq8wy2#
Python lookbehindAssert需要固定宽度,但是你可以尝试这样做:
ugmeyewa3#
最简单的解决办法是:
regex支持不同长度的look-behind模式。