我正在写一个scrapy spider。在一个回调方法中，它会产生在回调中设置了NO_CALLBACK的请求
解析回调产生新的请求，回调参数设置为NO_CALLBACK，这表示scrappy根本不调用回调，但它调用回调并引发以下错误：RuntimeError: The NO_CALLBACK callback has been called. This is a special callback value intended for requests whose callback is never meant to be called.
下面是代码：

from scrapy import Spider
from scrapy.http import TextResponse
from scrapy.http.request import NO_CALLBACK

class AppsSpider(Spider):
    name = "Apps"
    allowed_domains = ['steampowered.com', 'steamstatic.com']
    start_urls = ['https://store.steampowered.com/app/20']

    def parse(self, response: TextResponse):
        # preview media
        preview_section = response.css('#game_highlights')
        main_image_selector = '.game_header_image_full::attr("src")'
        preview_img_selector = '.highlight_screenshot a::attr("href")'
        preview_videos_selector = '.highlight_movie::attr("data-mp4-hd-source")'
        links = preview_section.css(
            ', '.join([main_image_selector, preview_img_selector, preview_videos_selector])).getall()

        # description section media
        description_section = response.css('#aboutThisGame')
        description_img_gif_selector = 'img::attr("src")'
        links += description_section.css(description_img_gif_selector).getall()

        yield from response.follow_all(links, callback=NO_CALLBACK)

试图解决它删除errback callback, cb_kwargs, meta, and dont_filter=True。没有一个成功。
医生说：
当分配给Request的callback参数时，它表示该请求根本不需要spider回调。
编辑：编辑为mre，您可以使用scrapy runspider www.example.com运行它nameofscript.py
下面是traceback：

2023-04-02 21:12:11 [scrapy.core.scraper] ERROR: Spider error processing <GET https://cdn.cloudflare.steamstatic.com/steam/apps/20/0000000164.1920x1080.jpg?t=1579634708> (referer: https://store.steampowered.com/app/20)
Traceback (most recent call last):
  File "C:\Users\leiver\miniconda3\envs\steam-scraping\Lib\site-packages\twisted\internet\defer.py", line 892, in _runCallbacks
    current.result = callback(  # type: ignore[misc]
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\leiver\miniconda3\envs\steam-scraping\Lib\site-packages\scrapy\http\request\__init__.py", line 40, in NO_CALLBACK
    raise RuntimeError(
RuntimeError: The NO_CALLBACK callback has been called. This is a special callback value intended for requests whose callback is never meant to be called.

def NO_CALLBACK(*args, **kwargs): """When assigned to the ``callback`` parameter of :class:`~scrapy.http.Request`, it indicates that the request is not meant to have a spider callback at all. For example: .. code-block:: python Request("https://example.com", callback=NO_CALLBACK) This value should be used by :ref:`components <topics-components>` that create and handle their own requests, e.g. through :meth:`scrapy.core.engine.ExecutionEngine.download`, so that downloader middlewares handling such requests can treat them differently from requests intended for the :meth:`~scrapy.Spider.parse` callback. """ raise RuntimeError( "The NO_CALLBACK callback has been called. This is a special callback " "value intended for requests whose callback is never meant to be " "called." )

1条答案

按热度按时间

vfwfrxfs1#

NO_CALLBACK并没有什么神奇的地方。当你向scrapy引擎提交一个请求时，默认情况下，它总是会尝试使用默认回调或使用回调参数指定的回调来处理响应，即使回调是NO_CALLBACK，也是如此。NO_CALLBACK应该做的是充当一个sort标志，以便您可以编写侦听它的自定义中间件，并将其与标准的碎片请求区别对待。
如果我们看一下scrapy.http.requests.NO_CALLBACK的源代码，你会看到：

如果你在文档中搜索，唯一能找到的使用这个类的例子实际上从来没有产生过请求，而是直接调用了scrapy引擎。
文档示例

request = scrapy.Request(screenshot_url, callback=NO_CALLBACK)
response = await maybe_deferred_to_future(
            spider.crawler.engine.download(request, spider)
        )

赞(0）回复(0）举报 2023-04-06

scrapy NO_CALLBACK在不应调用时被调用

1条答案

相关问题

热门标签

最新问答