正则表达式可以在Regex101中工作，但不能在Jupyter notebook中工作

stszievb 于 2023-06-25 发布在其他

关注(0)|答案(1)|浏览(172)

import re
with open('names.txt') as f:
    data = f.readlines()

twitter_pattern = re.compile(r"\s{1}[@]\w+")

twitter_match = twitter_pattern.findall(str(data))
print(twitter_match)

names.txt是全名、电话号码和Twitter句柄的列表。\s{1}[@]\w+应该只返回twitter句柄，但返回一个空列表。一切似乎都是working fine in regex101，但当我通过Jupyter Notebook运行时却不是这样。
该文件的内容与Regex101链接中提供的数据相同：

Osterberg, Sven-Erik    governor@norrbotten.co.se       Governor, Norrbotten    @sverik
, Tim   tim@killerrabbit.com        Enchanter, Killer Rabbit Cave
Butz, Ryan  ryanb@codingtemple.com  (555) 555-5543  CEO, Coding Temple  @ryanbutz
Doctor, The doctor+companion@tardis.co.uk       Time Lord, Gallifrey
Exampleson, Example me@example.com  555-555-5552    Example, Example Co.    @example
Pael, Ripal ripalp@codingtemple.com (555) 555-5553  Teacher, Coding Temple  @ripalp

regex

来源：https://stackoverflow.com/questions/76451727/regular-expression-works-in-regex101-but-not-jupyter-notebook

1条答案

按热度按时间

gc0ot86w1#

readlines()将文本作为字符串数组读取。
文件

Hello
World

数组["Hello", "World"]
str(data)是该数组的文本表示形式。在Python中，这是文本["Hello", "World"]。请注意，换行符被使用并解释为数组的下一项的开始。
在您的例子中，这意味着您将得到[和]以及大量额外的"和,，结果是Twitter句柄之后不再有空格。
若要修复代码，请不要将文件作为数组读取，而是将其作为文本读取。

with open('twitter.txt') as f:
    data = f.read()              # instead of readlines()

另外，请不要让你的正则表达式变得比必要的更复杂。\s@\w+是相同的，但不那么令人困惑。

赞(0）回复(0）举报 2023-06-25

我来回答

正则表达式可以在Regex101中工作，但不能在Jupyter notebook中工作

1条答案

相关问题

热门标签

最新问答