开发人员环境
Windows 11
PyCharm Community Edition 2021.3.1
个Python 3.10
我正在学习这个教程Download Images By Python and Scrapy,但我的脚本无法正常工作。
蜘蛛程序.py
import scrapy
class WikiSpider(scrapy.Spider):
name = 'wiki'
start_urls = ['https://en.wikipedia.org/wiki/Real_Madrid_CF']
def parse(self, response):
urls = response.css('.image img ::attr(src)').getall()
clean_urls = []
for url in urls:
clean_urls.append(response.urljoin(url))
yield {
'image_urls':clean_url
}
设置.py
BOT_NAME = 'imagedownload'
SPIDER_MODULES = ['imagedownload.spiders']
NEWSPIDER_MODULE = 'imagedownload.spiders'
ITEM_PIPELINES = {'scrapy.pipelines.images.ImagesPipeline': 1}
IMAGES_STORE = 'images_folder'
# Obey robots.txt rules
ROBOTSTXT_OBEY = True
在教程中,items.py
和pipelines.py
没有被修改。当我运行我的spider时,它运行没有错误,并且我可以看到解析的图像url,但是,如果图像没有被下载:
我为解决问题所采取的步骤
1.设置ROBOTSTXT_OBEY = False
1.将此代码段添加到我的spider.py
文件
save_location = os.getcwd()
custom_settings = {
"ITEM_PIPELINES": {'scrapy.pipelines.images.ImagesPipeline': 1},
"IMAGES_STORE": save_location
}
1.已尝试将此代码段添加到settings.py
IMAGES_STORE = os.getcwd()
如有任何帮助,我们将不胜感激!
What I expect is for the script to download images
1条答案
按热度按时间dgiusagp1#
你很接近了。我认为造成这一问题的原因是你没有为你生成的字典中的图像结果创建合适的
Field
。我会建议使用一个自定义的scrapy项目与字段预设,你可以这样做,在同一个文件作为您的蜘蛛,使它更容易,然后你应该只是添加所有的
ImagesPipeline
设置到custom_settings
字典在您的Spider
类。例如: