我正在写一个scrapy spider。在一个回调方法中,它会产生在回调中设置了NO_CALLBACK的请求
解析回调产生新的请求,回调参数设置为NO_CALLBACK,这表示scrappy根本不调用回调,但它调用回调并引发以下错误:RuntimeError: The NO_CALLBACK callback has been called. This is a special callback value intended for requests whose callback is never meant to be called.
下面是代码:
from scrapy import Spider
from scrapy.http import TextResponse
from scrapy.http.request import NO_CALLBACK
class AppsSpider(Spider):
name = "Apps"
allowed_domains = ['steampowered.com', 'steamstatic.com']
start_urls = ['https://store.steampowered.com/app/20']
def parse(self, response: TextResponse):
# preview media
preview_section = response.css('#game_highlights')
main_image_selector = '.game_header_image_full::attr("src")'
preview_img_selector = '.highlight_screenshot a::attr("href")'
preview_videos_selector = '.highlight_movie::attr("data-mp4-hd-source")'
links = preview_section.css(
', '.join([main_image_selector, preview_img_selector, preview_videos_selector])).getall()
# description section media
description_section = response.css('#aboutThisGame')
description_img_gif_selector = 'img::attr("src")'
links += description_section.css(description_img_gif_selector).getall()
yield from response.follow_all(links, callback=NO_CALLBACK)
试图解决它删除errback callback, cb_kwargs, meta, and dont_filter=True
。没有一个成功。
医生说:
当分配给Request的callback参数时,它表示该请求根本不需要spider回调。
编辑:编辑为mre,您可以使用scrapy runspider www.example.com运行它nameofscript.py
下面是traceback:
2023-04-02 21:12:11 [scrapy.core.scraper] ERROR: Spider error processing <GET https://cdn.cloudflare.steamstatic.com/steam/apps/20/0000000164.1920x1080.jpg?t=1579634708> (referer: https://store.steampowered.com/app/20)
Traceback (most recent call last):
File "C:\Users\leiver\miniconda3\envs\steam-scraping\Lib\site-packages\twisted\internet\defer.py", line 892, in _runCallbacks
current.result = callback( # type: ignore[misc]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\leiver\miniconda3\envs\steam-scraping\Lib\site-packages\scrapy\http\request\__init__.py", line 40, in NO_CALLBACK
raise RuntimeError(
RuntimeError: The NO_CALLBACK callback has been called. This is a special callback value intended for requests whose callback is never meant to be called.
1条答案
按热度按时间vfwfrxfs1#
NO_CALLBACK
并没有什么神奇的地方。当你向scrapy引擎提交一个请求时,默认情况下,它总是会尝试使用默认回调或使用回调参数指定的回调来处理响应,即使回调是NO_CALLBACK
,也是如此。NO_CALLBACK
应该做的是充当一个sort标志,以便您可以编写侦听它的自定义中间件,并将其与标准的碎片请求区别对待。如果我们看一下
scrapy.http.requests.NO_CALLBACK
的源代码,你会看到:如果你在文档中搜索,唯一能找到的使用这个类的例子实际上从来没有产生过请求,而是直接调用了scrapy引擎。
文档示例