scrapy 下载图像时遇到问题

iqjalb3h  于 2022-11-09  发布在  其他
关注(0)|答案(1)|浏览(120)

我已经建立了一个简单的刮刀下载图像从一个网站。不幸的是,我有问题与下载这些图像,这样没有下载。我已经搜索了类似的问题,并已实践这些,但它不为我工作。我有这个工作在过去,所以我不明白为什么它现在不工作。
我的刮刀:

import scrapy
from scrapy_exercises.items import ScrapyExercisesItem

class TestSpider(scrapy.Spider):
    name = 'test'
    start_urls = ['https://www.meadowhall.co.uk/eatdrinkshop?page=1']

    def start_requests(self):
        for url in self.start_urls:
            yield scrapy.Request(
                url=url,
                callback=self.parse
            )

    def parse(self, response):

        content_page = response.xpath("//div[@class='view-content']//div")
        for cnt in content_page:

            link = cnt.xpath('.//a/@href').get()
            image_url = cnt.xpath(".//img//@src").get()

            if link != None:
                items = ScrapyExercisesItem()
                items['images'] = [image_url.split('?')[0]]
                yield items

pipelines.py

from scrapy.pipelines.images import ImagesPipeline
class DownfilesPipeline(ImagesPipeline):
    def file_path(self, request, response=None, info=None):
        image_name: str = request.url.split("/")[-1]
        return image_name

settings.py

ITEM_PIPELINES = {
    'scrapy_exercises.pipelines.DownfilesPipeline': 55
    }
IMAGES_STORE = '.'

items.py:

class ScrapyExercisesItem(scrapy.Item):
    images = scrapy.Field()
jhiyze9q

jhiyze9q1#

我认为您需要做的就是添加一些设置,并在项目类中包含一个结果字段
在您的items.py文件中添加以下内容:

class ScrapyExercisesItem(scrapy.Item):
    images = scrapy.Field()
    results = scrapy.Field()

然后在您的settings.py文件中添加以下内容:

IMAGES_URLS_FIELD = 'images'
IMAGES_RESULT_FIELD = 'results'

那就再试一次。

相关问题