REGEX：findall complete tag a with target parameter in python

2vuwiymt 于 2023-10-14 发布在 Python

关注(0)|答案(2)|浏览(105)

我有下面的问题，我是一个新手在正则表达式。我什么都试过了，但我不知道如何才能消除这一点

text = '<a href="https://url1.com/q234243/" target="_blank">la orden  texto, texto ,texto jurídicas del bla bla bla que obliga a la corporación que rige al  tenis bla bla bla, el patrocinador de sus principales asuntos y<a href="https://url2.com/124345w23/" target="_blank"> el dictamen de texto random y mas texto random apu, que se han convertido en uno de los rammstein es la ostia económicos de la actividad, al punto de'

我需要使用re.findall（）获取a标记，如下所示

urls = ['<a href="https://url1.com/q234243/" target="_blank">',
        '<a href="https://url2.com/124345w23/" target="_blank">']

python

来源：https://stackoverflow.com/questions/77289408/regex-findall-complete-tag-a-with-target-parameter-in-python

2条答案

按热度按时间

ldioqlga1#

您可以使用内置的HTMLParser轻松获取数据，无论它的格式有多不正确。在本例中，我们覆盖feed和handle_starttag，以根据您的需要对其进行自定义。这个模块非常简单和直接。快速浏览一下文档，你可以让它做任何你想做的事情。这是经常用作BeautifulSoup的第二个参数的同一个解析器包，如果是，这就是BeautifulSoup使用的解析器。

from html.parser import HTMLParser

text = '<a href="https://url1.com/q234243/" target="_blank">la orden  texto, texto ,texto jurídicas del bla bla bla que obliga a la corporación que rige al  tenis bla bla bla, el patrocinador de sus principales asuntos y<a href="https://url2.com/124345w23/" target="_blank"> el dictamen de texto random y mas texto random apu, que se han convertido en uno de los rammstein es la ostia económicos de la actividad, al punto de'

class Parser(HTMLParser):
    def __init__(self):
        HTMLParser.__init__(self)

        self.anchors = []
        
    def feed(self, html:str) -> list:
        super().feed(html)
        return self.anchors
        
    def handle_starttag(self, tag:str, attrs:list) -> None:
        if tag == 'a':
            self.anchors.append(self.get_starttag_text())
            
            
parser  = Parser()
anchors = parser.feed(text)

结果

['<a href="https://url1.com/q234243/" target="_blank">', 
'<a href="https://url2.com/124345w23/" target="_blank">']

赞(0）回复(0）举报 2023-10-14

j1dl9f462#

如果你真的想使用正则表达式：

re.findall(r'(?<=<)[^>]+(?=>)',text)

赞(0）回复(0）举报 2023-10-14

我来回答

REGEX：findall complete tag a with target parameter in python

2条答案

相关问题

热门标签

最新问答