你好,我正在尝试调整图片的缩略图在scrappy。我已经看到了一些关于调整大小的帖子,但他们似乎解决了过去版本的scrappy。
要求-对于缩略图,如果它们太小而无法调整大小/放大到设置文件中规定的所需大小-“
- “琐罗”:(500,500),
- “小”:(116,90),
- “大”:(386,300),
- “缩放”:(648,504)
“这是我的密码--
Settings.py
BOT_NAME = 'project'
SPIDER_MODULES = ['project.spiders']
NEWSPIDER_MODULE = 'project.spiders'
...
LOG_STDOUT = True
LOG_FILE = 'scrapy_log.log'
ITEM_PIPELINES = {
'project.pipelines.ImagePipeline': 1,
}
IMAGES_STORE = 'image_dir'
IMAGES_URLS_FIELD = 'image_urls'
IMAGES_RESULT_FIELD = 'images'
IMAGES_THUMBS = {
'zoro': (500, 500),
'small': (116, 90),
'large': (386, 300),
'zoom': (648, 504)
}
Spider.py
import scrapy
from ..items import ItemImage
class ImageDownload(scrapy.Spider):
name = 'ImageDownload'
allowed_domains = ['antaira.com']
def start_requests(self):
urls = [
'https://www.antaira.com/products/PCIe-RS232',
]
for url in urls:
yield scrapy.Request(url, callback=self.parse)
def parse(self, response):
# iterate through each of the relative urls
for url in response.xpath('//div[@class="product-container"]//a/@href').getall():
product_link = response.urljoin(url) # use variable
yield scrapy.Request(product_link, callback=self.parse_new_item, dont_filter=True)
def parse_new_item(self, response):
item = ItemImage()
raw_image_urls = response.xpath('//div[@class="selectors"]/a/@href').getall()
name = response.xpath("//h1[@class='product-name']/text()").get()
filename = name.split(' ')[0].strip()
urls = [response.urljoin(i) for i in raw_image_urls]
item["name"] = filename
item["image_urls"] = urls
yield item
Pipelines.py
from scrapy.http import Request
from scrapy.pipelines.images import ImagesPipeline
from cStringIO import StringIO
import PIL
from PIL import Image
class ImagePipeline(ImagesPipeline):
def file_path(self, request, response=None, info=None, *args, item=None):
filename = request.meta["filename"].strip()
number = request.meta["file_num"]
return filename + "_" + str(number) + ".jpg"
def thumb_path(self, request, thumb_id, response=None, info=None):
filename = request.meta["filename"]
number = request.meta["file_num"]
return f'thumbs/{thumb_id}/{filename}_{str(number)}.jpg'
def get_media_requests(self, item, info):
name = item["name"]
for i, url in enumerate(item["image_urls"]):
meta = {"filename": name, "file_num": i}
yield Request(url, meta=meta)
def convert_image(self, image, size=None):
if image.format == 'PNG' and image.mode == 'RGBA':
background = Image.new('RGBA', image.size, (255, 255, 255))
background.paste(image, image)
image = background.convert('RGB')
elif image.mode != 'RGB':
image = image.convert('RGB')
if size is None:
image = image.copy()
basewidth = size[0] # the size from the settings.py
wpercent = (basewidth/float(image.size[0]))
hsize = int((float(image.size[1])*float(wpercent)))
image = image.resize((basewidth,hsize), Image.ANTIALIAS)
buf = StringIO()
image.save(buf, 'JPEG', quality=72)
return image, buf
嘿@Alex,我运行了你回复的代码,它返回了预期的常规图像,但我没有得到任何具有预期缩略图的子文件夹。这是我在scrapy.log中得到的-
Scrapy.log
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/joel/.local/lib/python3.8/site-packages/scrapy/pipelines/files.py", line 465, in media_downloaded
checksum = self.file_downloaded(response, request, info, item=item)
File "/home/joel/.local/lib/python3.8/site-packages/scrapy/pipelines/media.py", line 140, in wrapper
return func(*args,**kwargs)
File "/home/joel/.local/lib/python3.8/site-packages/scrapy/pipelines/images.py", line 115, in file_downloaded
return self.image_downloaded(response, request, info, item=item)
File "/home/joel/.local/lib/python3.8/site-packages/scrapy/pipelines/media.py", line 140, in wrapper
return func(*args,**kwargs)
File "/home/joel/.local/lib/python3.8/site-packages/scrapy/pipelines/images.py", line 119, in image_downloaded
for path, image, buf in self.get_images(response, request, info, item=item):
File "/home/joel/.local/lib/python3.8/site-packages/scrapy/pipelines/images.py", line 145, in get_images
thumb_image, thumb_buf = self.convert_image(image, size)
File "/home/joel/Desktop/project/project/pipelines.py", line 29, in convert_image
image = image.resize(size, image.ANTIALIAS)
AttributeError: 'JpegImageFile' object has no attribute 'ANTIALIAS'
1条答案
按热度按时间zvokhttg1#
您只需将管道中的
convert_image
方法更改为以下代码:我添加了一些内嵌注解...
您还可以删除管道文件中所有不必要的导入,尤其是
form cString import StringIO
,因为它本身会引发错误。