你好我有两个管道,第一个下载照片:
class ModelsPipeline(ImagesPipeline):
def get_media_requests(self, item, info):
for image_url in item['image_urls']:
yield scrapy.Request(image_url)
def file_path(self, request, response=None, info=None, *, item=None):
image_url_hash = hashlib.shake_256(request.url.encode()).hexdigest(5)
image_filename = f'{item["name"]}/{image_url_hash}.jpg'
return image_filename
def item_completed(self, results, item, info):
image_paths = [x['path'] for ok, x in results if ok]
for image in image_paths:
file_extension = os.path.splitext(image)[1]
img_path = f'{IMAGES_STORE}{image}'
md5 = hashlib.md5(open(img_path, 'rb').read()).hexdigest()
img_destination = f'{IMAGES_STORE}{item["name"]}/{md5}{file_extension}'
os.rename(img_path, img_destination)
return item
第二个是将以前的信息存储在数据库中
class DatabasePipeline():
def open_spider(self, spider):
self.client = db_connect()
def close_spider(self, spider):
self.client.close()
def process_item(self, item, spider):
self.client.upsert(item)
第一个管道中的item_completed函数返回一个名称和一个路径,我希望将其发送到第二个管道以便存储在数据库中,但我无法访问该数据。
问题是我该怎么做?
谢谢
1条答案
按热度按时间zengzsys1#
您可以将名称和路径添加到ModelsPipeline中的项目:
在DatabasePipeline的process_item中,您可以访问它: