regex 如何在正则表达式中使用列表元素

0yycz8jy 于 2023-04-22 发布在其他

关注(0)|答案(1)|浏览(109)

我正在构建一个正则表达式查询（在Python 3.7中）来搜索字符串中的一些模式。这样做的目的是将日期与查询的其余部分分开。
日期可以采用多种格式，特别是月份。我想在查询中集成列表元素，使其更加强大。在下面的示例中，我可以获得具有多种格式的日期内容，如“oct. 2019 / febr，2018 / 1997 / 16 January 2012，08/09/84，08/09/1964，等..."。有些格式是陌生的，但我必须处理这个。
我想要的一个例子：

# Python 3.7
import re 

text = """
    Some text with some oct date oct. 2017
    Here is a quote from 2018
    Now we can talk about some articles published the 27 January 2017
    This example is the critic's point this is the end. 2019.
"""

dt_elements = ["oct,", "oct.", "jan."] # And all the stuff I can think about

for line in text.split("\n"):
    if re.search(r"[a-zA-Zéû]{3}. \d{4}", line):
        data = re.search(r"[a-zA-Zéû]{3}. \d{4}", line).group()[0]  # How to integrate dt_elements  here?!
        # This does not works

# Take care about critic example below!!!

就是这样：）
PS：对于其他日期格式，如%D/%M/%Y或%D Janvier %Y，我已经有一些正在工作的查询，我正在寻找解决上述问题的方法

regex

来源：https://stackoverflow.com/questions/76022427/regex-how-to-use-list-elements-in-regex

1条答案

按热度按时间

0kjbasz61#

解析（半）自由文本中的日期可能会非常麻烦--你可以使用dateparser的search_dates功能来处理一些情况，这样你就可以专注于更奇怪的部分：-）
就将月份名称/缩写列表转换为正则表达式模式而言，您可以使用re.escape函数来确保各个元素中没有特殊字符，然后使用|将它们连接起来以生成您的模式。
还有一些其他的事情要注意-你可以更灵活地使用大写/小写的敏感性和空格，并检查你的月份/年份的任何一端的单词边界，例如。

months = ["oct,", "oct.", "jan.", "january"]
months_pattern = (
  "(?:" +
    "|".join(re.escape(month) for month in months) +
  ")"
)

year_pattern = r"\d{4}"  # can maybe limit to r"20[012]\d" ?

date_pattern = r"(?:\b" + months_pattern + r"\s*)?(?:\b" + year_pattern + r"\b)"

for line in text.split("\n"):
    match = re.search(date_pattern, line, re.IGNORECASE)
    if match:
        data = match.group()
        print(data)

输出

oct. 2017
2018
January 2017
2019

赞(0）回复(0）举报 2023-04-22

我来回答

regex 如何在正则表达式中使用列表元素

1条答案

相关问题

热门标签

最新问答