scrapy 从这个网站抓取特征图像，但它返回这个'data：image/gif

k4ymrczo 于 2022-11-29 发布在其他

关注(0)|答案(2)|浏览(153)

在Python中使用Scrapy和Scrapy shell来刮这个网站的功能图像https://www.thrillist.com/travel/nation/all-the-ways-to-cool-off-in-austin，但它返回这个data:image/gif;base64,R0而不是图像的源代码，我需要有人的帮助，如果任何一个告诉我的方式来修复这个问题，以获得图像的源代码
这是我的代码

Feature_Image = [i.strip() for i in response.xpath('//*[@id="main-content"]/article/div/div/div[2]/div[1]/picture/img/@src').getall()][0]

scrapy

来源：https://stackoverflow.com/questions/74120927/scrape-the-feature-image-from-this-website-but-it-returns-this-dataimage-gif

2条答案

按热度按时间

uklbhaso1#

页面上最大的图片应该是桌面上的一张(-常识逻辑。那么为什么不试着像下面这样找到它的来源呢？

pic = response.xpath('//picture[@data-testid="picture-tag"]//source[@data-size="desktop"]/@srcset').get()

结果是该页面海报最大尺寸的来源：

https://assets3.thrillist.com/v1/image/3086882/1584x1056/crop;webp=auto;jpeg_quality=60;progressive.jpg

赞(0）回复(0）举报 2022-11-29

ryoqjall2#

看起来这个标签有一个data-src属性来保存链接和一些图像属性，解析文本并提取第一部分就可以得到链接。

>>> link = response.xpath("//div[@data-element-type='ParagraphMainImage']//img/@data-src").get().split(";")[0]
>>> link
'https://assets3.thrillist.com/v1/image/3086882/414x310/crop'

如果你想区分图像的类型，你可以手动添加.jpg到末尾。链接可以使用扩展名，也可以不使用扩展名。

赞(0）回复(0）举报 2022-11-29

我来回答

scrapy 从这个网站抓取特征图像，但它返回这个'data：image/gif

2条答案

相关问题

热门标签

最新问答