regex 用逗号分隔字符串，但忽略括号内的逗号

wi3ka0sx 于 2023-01-27 发布在其他

关注(0)|答案(5)|浏览(211)

我正在尝试使用python用逗号分隔字符串：

s = "year:2020,concepts:[ab553,cd779],publisher:elsevier"

但是我想忽略括号[]中的逗号，所以上面的结果是：

["year:2020", "concepts:[ab553,cd779]", "publisher:elsevier"]

有人对如何做有什么建议吗？我试着这样使用re.split：

params = re.split(",(?![\w\d\s])", param)

但它不能正常工作。

regex

来源：https://stackoverflow.com/questions/70684603/split-string-by-comma-but-ignore-commas-within-brackets

5条答案

按热度按时间

qq24tv8q1#

result = re.split(r",(?!(?:[^,\[\]]+,)*[^,\[\]]+])", subject, 0)

,                 # Match the character “,” literally
(?!               # Assert that it is impossible to match the regex below starting at this position (negative lookahead)
   (?:               # Match the regular expression below
      [^,\[\]]          # Match any single character NOT present in the list below
                           # The literal character “,”
                           # The literal character “[”
                           # The literal character “]”
         +                 # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
      ,                 # Match the character “,” literally
   )
      *                 # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
   [^,\[\]]          # Match any single character NOT present in the list below
                        # The literal character “,”
                        # The literal character “[”
                        # The literal character “]”
      +                 # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   ]                 # Match the character “]” literally
)

更新以支持括号中的2个以上项目。

year:2020,concepts:[ab553,cd779],publisher:elsevier,year:2020,concepts:[ab553,cd779,xx345],publisher:elsevier

赞(0）回复(0）举报 2023-01-27

yv5phkfx2#

这个正则表达式适用于你的例子：

,(?=[^,]+?:)

在这里，我们使用正向前瞻查找逗号，然后查找非逗号和冒号字符，最后查找冒号。这将正确地找到您要搜索的<comma><key>模式。当然，如果允许键使用逗号，则需要进一步调整。
您可以查看regexr here

赞(0）回复(0）举报 2023-01-27

hrirmatl3#

您可以使用用户定义的函数而不是split来解决这个问题：

s = "year:2020,concepts:[ab553,cd779],publisher:elsevier"

def split_by_commas(s):
    lst = list()
    last_bracket = ''
    word = ""
    for c in s:
        if c == '[' or c == ']':
            last_bracket = c
        if c == ',' and last_bracket == ']':
            lst.append(word)
            word = ""
            continue
        elif c == ',' and last_bracket == '[':
            word += c
            continue
        elif c == ',':
            lst.append(word)
            word = ""
            continue
        word += c
    lst.append(word)
    return lst
main_lst = split_by_commas(s)

print(main_lst)

运行上述代码的结果：

['year:2020', 'concepts:[ab553,cd779]', 'publisher:elsevier']

赞(0）回复(0）举报 2023-01-27

t3psigkw4#

如果使用只具有lookahead的模式Assert右侧的字符，则不会Assert左侧是否有伴随字符。
除了使用拆分，您还可以匹配方括号中的值的一个或多个重复，或者匹配除逗号之外的任何字符。

(?:[^,]*\[[^][]*])+[^,]*|[^,]+

Regex demo

s = "year:2020,concepts:[ab553,cd779],publisher:elsevier"
params = re.findall(r"(?:[^,]*\[[^][]*])+[^,]*|[^,]+", s)
print(params)

产出

['year:2020', 'concepts:[ab553,cd779]', 'publisher:elsevier']

赞(0）回复(0）举报 2023-01-27

wmtdaxz35#

我采用了@Bemwa的解决方案（不适用于我的用例）

def split_by_commas(s):
    lst = list()
    brackets = 0
    word = ""
    for c in s:
        if c == "[":
            brackets += 1
        elif c == "]":
            if brackets > 0:
                brackets -= 1
        elif c == "," and not brackets:
            lst.append(word)
            word = ""
            continue
        word += c
    lst.append(word)
    return lst

赞(0）回复(0）举报 2023-01-27

我来回答

regex 用逗号分隔字符串，但忽略括号内的逗号

5条答案

相关问题

热门标签

最新问答