查找所有超链接(< a>，< /a>)并创建此超链接的列表

vq8itlhq 于 2021-09-29 发布在 Java

关注(0)|答案(2)|浏览(388)

因此，我找到了某个段落的所有超链接，其格式如下：

"<p>Source: <a href=""www.source1.com"">Source 1</a>, <a href=""www.secondsource.com"">Second source</a>, <a href=""www.andthisisthelastone.com"">Third source</a></p>"

所有段落都有自己的超链接，这些超链接存储在 Dataframe 中的字符串中，因此每个段落都有自己的行和由超链接组成的字符串。
现在我试着用以下格式列出它：

['Source 1#www.source1.com', 'Second source#www.secondsource.com', 'Third source#www.andthisisthelastone.com'

我得出了以下结论：

hyperlinks = []
    for string in string_hyperlinks:
        links.append(re.findall(r'(https?://[^\s]+)', string))

由此得出以下一般结果：

['www.source.com"">Source 1</a>', 'www.secondsource.com"">Second source', 'www.andthisisthelastsource.com"">Third source</a></p>']

如何将其转换为正确的格式？

python

来源：https://stackoverflow.com/questions/68541004/finding-all-hyperlinks-a-a-and-creating-a-list-of-this

2条答案

按热度按时间

dldeef671#

尝试在正则表达式的“+”后面添加一个问号，如下所示 (https?://[^\s]+?)

赞(0）回复(0）举报 2021-09-29

q35jwt9p2#

您提供的html代码无效。请查收。
你可以用 BeautifulSoup 选择所有 <a> 标记并提取所需的数据，并将其放入列表中。

from bs4 import BeautifulSoup

s = '''<p>Source: <a href="www.source1.com">Source 1</a>, <a href="www.secondsource.com">Second source</a>, <a href="www.andthisisthelastone.com">Third source</a></p>'''

soup = BeautifulSoup(s, 'html.parser')
res = []
atags = soup.findAll('a')

for i in atags:
    tex = i.text.strip()
    href = i['href']
    res.append(f'{tex}#{href}')
print(res)

['Source 1#www.source1.com', 'Second source#www.secondsource.com', 'Third source#www.andthisisthelastone.com']

赞(0）回复(0）举报 2021-09-29

我来回答

查找所有超链接(< a>，< /a>)并创建此超链接的列表

2条答案

相关问题

热门标签

最新问答