Scrapy调整图像缩略图大小- 2022

smtd7mpg  于 2022-11-09  发布在  其他
关注(0)|答案(1)|浏览(185)

你好,我正在尝试调整图片的缩略图在scrappy。我已经看到了一些关于调整大小的帖子,但他们似乎解决了过去版本的scrappy。
要求-对于缩略图,如果它们太小而无法调整大小/放大到设置文件中规定的所需大小-“

  • “琐罗”:(500,500),
  • “小”:(116,90),
  • “大”:(386,300),
  • “缩放”:(648,504)

“这是我的密码--
Settings.py

BOT_NAME = 'project'

SPIDER_MODULES = ['project.spiders']
NEWSPIDER_MODULE = 'project.spiders'

...

LOG_STDOUT = True
LOG_FILE = 'scrapy_log.log'

ITEM_PIPELINES = {
   'project.pipelines.ImagePipeline': 1,
}
IMAGES_STORE = 'image_dir'
IMAGES_URLS_FIELD = 'image_urls'
IMAGES_RESULT_FIELD = 'images'

IMAGES_THUMBS = {
    'zoro': (500, 500),
    'small': (116, 90),
    'large': (386, 300),
    'zoom': (648, 504)
}

Spider.py

import scrapy
from ..items import ItemImage

class ImageDownload(scrapy.Spider):
    name = 'ImageDownload'
    allowed_domains = ['antaira.com']

    def start_requests(self):
        urls = [
            'https://www.antaira.com/products/PCIe-RS232',
        ]
        for url in urls:
            yield scrapy.Request(url, callback=self.parse)

    def parse(self, response):
        # iterate through each of the relative urls
        for url in response.xpath('//div[@class="product-container"]//a/@href').getall():
            product_link = response.urljoin(url)  # use variable
            yield scrapy.Request(product_link, callback=self.parse_new_item, dont_filter=True)

    def parse_new_item(self, response):
        item = ItemImage()
        raw_image_urls = response.xpath('//div[@class="selectors"]/a/@href').getall()
        name = response.xpath("//h1[@class='product-name']/text()").get()
        filename = name.split(' ')[0].strip()
        urls = [response.urljoin(i) for i in raw_image_urls]
        item["name"] = filename
        item["image_urls"] = urls
        yield item

Pipelines.py

from scrapy.http import Request
from scrapy.pipelines.images import ImagesPipeline
from cStringIO import StringIO
import PIL
from PIL import Image

class ImagePipeline(ImagesPipeline):

    def file_path(self, request, response=None, info=None, *args, item=None):
        filename = request.meta["filename"].strip()
        number = request.meta["file_num"]
        return filename + "_" + str(number) + ".jpg"

    def thumb_path(self, request, thumb_id, response=None, info=None):
        filename = request.meta["filename"]
        number = request.meta["file_num"]
        return f'thumbs/{thumb_id}/{filename}_{str(number)}.jpg'

    def get_media_requests(self, item, info):
        name = item["name"]
        for i, url in enumerate(item["image_urls"]):
            meta = {"filename": name, "file_num": i}
            yield Request(url, meta=meta)

    def convert_image(self, image, size=None):
        if image.format == 'PNG' and image.mode == 'RGBA':
            background = Image.new('RGBA', image.size, (255, 255, 255))
            background.paste(image, image)
            image = background.convert('RGB')
        elif image.mode != 'RGB':
            image = image.convert('RGB')
        if size is None:
            image = image.copy()
            basewidth = size[0] # the size from the settings.py
            wpercent = (basewidth/float(image.size[0]))
            hsize = int((float(image.size[1])*float(wpercent)))
            image = image.resize((basewidth,hsize), Image.ANTIALIAS)

        buf = StringIO()
        image.save(buf, 'JPEG', quality=72)
        return image, buf

嘿@Alex,我运行了你回复的代码,它返回了预期的常规图像,但我没有得到任何具有预期缩略图的子文件夹。这是我在scrapy.log中得到的-
Scrapy.log

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/joel/.local/lib/python3.8/site-packages/scrapy/pipelines/files.py", line 465, in media_downloaded
    checksum = self.file_downloaded(response, request, info, item=item)
  File "/home/joel/.local/lib/python3.8/site-packages/scrapy/pipelines/media.py", line 140, in wrapper
    return func(*args,**kwargs)
  File "/home/joel/.local/lib/python3.8/site-packages/scrapy/pipelines/images.py", line 115, in file_downloaded
    return self.image_downloaded(response, request, info, item=item)
  File "/home/joel/.local/lib/python3.8/site-packages/scrapy/pipelines/media.py", line 140, in wrapper
    return func(*args,**kwargs)
  File "/home/joel/.local/lib/python3.8/site-packages/scrapy/pipelines/images.py", line 119, in image_downloaded
    for path, image, buf in self.get_images(response, request, info, item=item):
  File "/home/joel/.local/lib/python3.8/site-packages/scrapy/pipelines/images.py", line 145, in get_images
    thumb_image, thumb_buf = self.convert_image(image, size)
  File "/home/joel/Desktop/project/project/pipelines.py", line 29, in convert_image
    image = image.resize(size, image.ANTIALIAS)
AttributeError: 'JpegImageFile' object has no attribute 'ANTIALIAS'
zvokhttg

zvokhttg1#

您只需将管道中的convert_image方法更改为以下代码:
我添加了一些内嵌注解...

from io import BytesIO
from PIL import Image

def convert_image(self, image, size=None):
    if size is not None:   # If the size is not None then it is a thumbnail
        # so we resize it according the parameter
        image = image.resize(size, Image.ANTIALIAS)
    else:
        # otherwise we give the image to back to the superclass version of 
        # this method for it to process.
        return super().convert_image(image, size=size)  
    buf = BytesIO()  #  These next 3 lines are from the scrapy source code.
    image.save(buf, 'JPEG')  
    return image, buf

您还可以删除管道文件中所有不必要的导入,尤其是form cString import StringIO,因为它本身会引发错误。

相关问题