regex 正则表达式区分大小写(？-i)在jupyter笔记本中不起作用[重复]

o8x7eapl 于 2023-01-21 发布在其他

关注(0)|答案(2)|浏览(161)

restrict 1 word as case sensitive and other as case insensitive in python regex | (pipe)（2个答案）
3天前关闭。
我正在尝试从文本中提取公司名称。示例文本：
"ABC私人有限公司（批发）是最大的公司."
使用的正则表达式：

\b(?:(?-i)[A-Z][a-zA-Z()\.]*\s){2,5}

它正确标识了https://regexr.com/

中的公司名称
但是当我在jupyter notebook中尝试同样的操作时，我得到了一个错误。

combined_df['company'] = combined_df['subject_link_text'].str.findall(r"\b(?:(?-i)[A-Z][a-zA-Z()\.]*\s){2,5}")

错误：

感谢你的帮助。先谢了。

2条答案

我以为不区分大小写的标志是(?i)，而不是(?-i)。请尝试以下操作：

combined_df['company'] = combined_df['subject_link_text'].str.findall(r"\b(?:(?i)[A-Z][a-zA-Z()\.]*\s){2,5}")

或者，只需将flags选项与re.I一起使用，以区分大小写：

combined_df['company'] = combined_df['subject_link_text'].str.findall(r"\b(?:[A-Z][a-zA-Z()\.]*\s){2,5}", flags=re.I)

顺便说一句，这里的标志似乎是多余的，\b(?:[A-Z][a-zA-Z()\.]*\s){2,5}应该可以完成这个任务-检查@regex101：

combined_df['company'] = combined_df['subject_link_text'].str.findall(r"\b(?:[A-Z][a-zA-Z()\.]*\s){2,5}")