如何在文本字符串Python中查找省略号？

yv5phkfx 于 2022-12-10 发布在 Python

关注(0)|答案(1)|浏览(218)

这对Python（和Stack Overflow！）来说还是个新手。我有一个包含主题行数据（文本字符串）的数据集，我正在用它构建一个单词包模型。我正在创建新的变量，为各种可能的场景标记0或1，但我一直在尝试识别文本中哪里有省略号（“......”）。下面是我的起点：

Data_Frame['Elipses'] = Data_Frame.Subject_Line.str.match('(\w+)\.{2,}(.+)')

输入（'...'）不起作用的原因很明显，但建议使用上面的RegEx代码--仍然不起作用。

Data_Frame['Elipses'] = Data_Frame.Subject_Line.str.match('.\.\.\')

没有骰子。
上面的代码shell适用于我创建的其他变量，但我也在创建0-1输出而不是True/False（在R中是一个“as.numeric”参数）时遇到了麻烦。
谢谢你！

python

来源：https://stackoverflow.com/questions/46529659/how-to-find-ellipses-in-text-string-python

1条答案

按热度按时间

lymgl2op1#

使用search()而不是match()会在文本中的任何位置发现省略号。在Pandas中，str.contains()支持正则表达式：
例如在《Pandas》中：

import pandas as pd

df = pd.DataFrame({'Text' : ["hello..", "again... this", "is......a test",  "Real ellipses… here", "...not here"]})
df['Ellipses'] = df.Text.str.contains(r'\w+(\.{3,})|…')

print(df)

为您提供：

Text  Ellipses
0              hello..     False
1        again... this      True
2       is......a test      True
3  Real ellipses… here      True
4          ...not here     False

或者没有Pandas：

import re

for test in ["hello..", "again... this", "is......a test",  "Real ellipses… here", "...not here"]:
    print(int(bool(re.search(r'\w+(\.{3,})|…', test))))

这与中间测试匹配，给出：

看一看Python文档中的search-vs-match，它给出了很好的解释。
要显示匹配的字词：

import re
    
for test in ["hello..", "again... this", "is......a test",  "...def"]:
    ellipses = re.search(r'(\w+)\.{3,}', test)
    
    if ellipses:
        print(ellipses.group(1))

为您提供：

again
is

赞(0）回复(0）举报 2022-12-10

我来回答

如何在文本字符串Python中查找省略号？

1条答案

相关问题

热门标签

最新问答