我正在为一个项目构建一个简单的爬行器,我的代码中有这个错误。它仍然运行,但我想了解并解决它。我的爬行器看起来像这样:
class BookSpider(scrapy.Spider):
name = "books"
@classmethod
def from_crawler(cls, crawler):
return cls(crawler.stats)
def __init__(self, stats):
self.stats = stats
self.visited_pages = []
错误消息如下所示:
2018-05-23 17:00:50 [scrapy.dupefilters] DEBUG: Filtered duplicate request: <GET https://www.goodreads.com/book/show/35036409-my-brilliant-friend> - no more duplicates will be shown (see DUPEFILTER_DEBUG to show all duplicates)
2018-05-23 17:00:50 [scrapy.core.scraper] ERROR: Spider error processing <GET https://www.goodreads.com/book/show/17465515-the-story-of-a-new-name> (referer: https://www.goodreads.com/book/show/35036409-my-brilliant-friend)
Traceback (most recent call last):
File "/home/m17/elefano/miniconda3/lib/python3.6/site-packages/scrapy/utils/defer.py", line 102, in iter_errback
yield next(it)
GeneratorExit
Unhandled error in Deferred:
2018-05-23 17:00:50 [twisted] CRITICAL: Unhandled error in Deferred:
2018-05-23 17:00:50 [twisted] CRITICAL:
Traceback (most recent call last):
File "/home/m17/elefano/miniconda3/lib/python3.6/site-packages/twisted/internet/task.py", line 517, in _oneWorkUnit
result = next(self._iterator)
File "/home/m17/elefano/miniconda3/lib/python3.6/site-packages/scrapy/utils/defer.py", line 63, in <genexpr>
work = (callable(elem, *args,**named) for elem in iterable)
File "/home/m17/elefano/miniconda3/lib/python3.6/site-packages/scrapy/core/scraper.py", line 183, in _process_spidermw_output
self.crawler.engine.crawl(request=output, spider=spider)
File "/home/m17/elefano/miniconda3/lib/python3.6/site-packages/scrapy/core/engine.py", line 210, in crawl
self.schedule(request, spider)
File "/home/m17/elefano/miniconda3/lib/python3.6/site-packages/scrapy/core/engine.py", line 216, in schedule
if not self.slot.scheduler.enqueue_request(request):
File "/home/m17/elefano/miniconda3/lib/python3.6/site-packages/scrapy/core/scheduler.py", line 55, in enqueue_request
self.df.log(request, self.spider)
File "/home/m17/elefano/miniconda3/lib/python3.6/site-packages/scrapy/dupefilters.py", line 73, in log
spider.crawler.stats.inc_value('dupefilter/filtered', spider=spider)
AttributeError: 'BookSpider' object has no attribute 'crawler'
我有一个模糊的想法,这是一个问题的初始化,但我不能找出什么问题。
2条答案
按热度按时间rbl8hiat1#
我认为您的spider没有正确地从crawler类继承。当我遇到这个错误时,我能够通过在from_crawler()方法中添加super()调用来解决它,该方法将crawler属性/方法引入您的自定义spider
下面是一个示例(请参阅from_crawler方法):
来源:https://doc.scrapy.org/en/latest/topics/signals.html
xn1cxnb42#
在init中添加
crawler
参数将解决此问题