scrapy 为什么我的Xpath表达式只返回href的一部分而不返回整个href？

lnvxswe2 于 2022-11-09 发布在其他

关注(0)|答案(1)|浏览(245)

Scrapy shell中的xpath表达式response.xpath('//a[@class="playerName"]/@href')只返回要抓取的href的后半部分。我要抓取的网站是https://www.premierleague.com/players。我的xpath表达式只返回“/player/63289/Brenden-Aaronson/overview”而不是“https：//www.premierleague.com/player/63289/Brenden-Aaronson/overview”。
Screenshot of the html source code in question

scrapy

来源：https://stackoverflow.com/questions/73726271/why-my-xpath-expression-only-returning-part-of-the-href-and-not-the-whole-href

1条答案

按热度按时间

vhmi4jdf1#

Web上有两种类型的链接：绝对链接和相对链接。
Web页面通常使用相对链接来链接到其内部内容。example.com
Scrapy看到的是网站上的链接--相对链接--如果你想把它们转换成绝对链接，你可以使用urllib.parse.urljoin函数：

from urllib.parse import urljoin

relative_url = "/foo.html"
print(urljoin(response.url, relative_url))

# in your case:

relative_url = response.xpath('//a[@class="playerName"]/@href').get()
print(urljoin(response.url, relative_url))

赞(0）回复(0）举报 2022-11-09

我来回答

scrapy 为什么我的Xpath表达式只返回href的一部分而不返回整个href？

1条答案

相关问题

热门标签

最新问答