为了从_init_
访问设置,我必须添加from_crawler @classmethod
。现在看来,scrappy框架的一些功能丢失了。我得到AttributeError: 'Code1Spider' object has no attribute 'crawler'
时,url失败,蜘蛛试图重试请求。scrappy版本是2.0.1。蜘蛛运行在Zyte云。
我做错了什么,该怎么补救?
下面是我的蜘蛛代码:
class Code1Spider(scrapy.Spider):
name = 'cointelegraph_pr'
allowed_domains = ['cointelegraph.com']
start_urls = ['https://cointelegraph.com/press-releases']
def __init__(self, settings):
#Returns settings values as dict
settings=settings.copy_to_dict()
self.id = int(str(datetime.now().timestamp()).split('.')[0])
self.gs_id = settings.get('GS_ID')
self.endpoint_url = settings.get('ENDPOINT_URL')
self.zyte_api_key = settings.get('ZYTE_API_KEY')
self.zyte_project_id = settings.get('ZYTE_PROJECT_ID')
self.zyte_collection_name = self.name
#Loads a list of stop words from predefined google sheet
self.denied = load_gsheet(self.gs_id)
#Loads all scraped urls from previous runs from zyte collections
self.scraped_urls = load_from_collection(self.zyte_project_id, self.zyte_collection_name, self.zyte_api_key)
logging.info("###############################")
logging.info("Number of previously scraped URLs = {}.".format(len(self.scraped_urls)))
logging.info("")
# We need this to pass settings into init. Otherwise settings will be accessible only after init function.
# As per https://docs.scrapy.org/en/1.8/topics/settings.html#how-to-access-settings
@classmethod
def from_crawler(cls, crawler):
settings = crawler.settings
return cls(settings)
下面是错误:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
File "/usr/local/lib/python3.8/site-packages/scrapy/core/downloader/middleware.py", line 42, in process_request
defer.returnValue((yield download_func(request=request, spider=spider)))
File "/usr/local/lib/python3.8/site-packages/twisted/internet/defer.py", line 1362, in returnValue
raise _DefGen_Return(val)
twisted.internet.defer._DefGen_Return: <504 https://cointelegraph.com/press-releases/the-launch-of-santa-browser-to-bring-in-the-next-200m-users-onto-web30>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
File "/usr/local/lib/python3.8/site-packages/scrapy/core/downloader/middleware.py", line 51, in process_response
response = yield deferred_from_coro(method(request=request, response=response, spider=spider))
File "/usr/local/lib/python3.8/site-packages/scrapy/downloadermiddlewares/retry.py", line 53, in process_response
return self._retry(request, reason, spider) or response
File "/usr/local/lib/python3.8/site-packages/scrapy/downloadermiddlewares/retry.py", line 69, in _retry
stats = spider.crawler.stats
AttributeError: 'Code1Spider' object has no attribute 'crawler'
其他一切都是scrapy默认蜘蛛。没有修改设置或中间件。我做错了什么,如何修复它?
1条答案
按热度按时间gwbalxhn1#
这是因为您覆盖了
from_crawler
方法,而没有将crawler
分配给spider。将
from_crawler
方法更改为以下方法: