scrapy NO_CALLBACK在不应调用时被调用

anhgbhbe  于 2023-04-06  发布在  其他
关注(0)|答案(1)|浏览(191)

我正在写一个scrapy spider。在一个回调方法中,它会产生在回调中设置了NO_CALLBACK的请求
解析回调产生新的请求,回调参数设置为NO_CALLBACK,这表示scrappy根本不调用回调,但它调用回调并引发以下错误:RuntimeError: The NO_CALLBACK callback has been called. This is a special callback value intended for requests whose callback is never meant to be called.
下面是代码:

from scrapy import Spider
from scrapy.http import TextResponse
from scrapy.http.request import NO_CALLBACK

class AppsSpider(Spider):
    name = "Apps"
    allowed_domains = ['steampowered.com', 'steamstatic.com']
    start_urls = ['https://store.steampowered.com/app/20']

    def parse(self, response: TextResponse):
        # preview media
        preview_section = response.css('#game_highlights')
        main_image_selector = '.game_header_image_full::attr("src")'
        preview_img_selector = '.highlight_screenshot a::attr("href")'
        preview_videos_selector = '.highlight_movie::attr("data-mp4-hd-source")'
        links = preview_section.css(
            ', '.join([main_image_selector, preview_img_selector, preview_videos_selector])).getall()

        # description section media
        description_section = response.css('#aboutThisGame')
        description_img_gif_selector = 'img::attr("src")'
        links += description_section.css(description_img_gif_selector).getall()

        yield from response.follow_all(links, callback=NO_CALLBACK)

试图解决它删除errback callback, cb_kwargs, meta, and dont_filter=True。没有一个成功。
医生说:
当分配给Request的callback参数时,它表示该请求根本不需要spider回调。
编辑:编辑为mre,您可以使用scrapy runspider www.example.com运行它nameofscript.py
下面是traceback:

2023-04-02 21:12:11 [scrapy.core.scraper] ERROR: Spider error processing <GET https://cdn.cloudflare.steamstatic.com/steam/apps/20/0000000164.1920x1080.jpg?t=1579634708> (referer: https://store.steampowered.com/app/20)
Traceback (most recent call last):
  File "C:\Users\leiver\miniconda3\envs\steam-scraping\Lib\site-packages\twisted\internet\defer.py", line 892, in _runCallbacks
    current.result = callback(  # type: ignore[misc]
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\leiver\miniconda3\envs\steam-scraping\Lib\site-packages\scrapy\http\request\__init__.py", line 40, in NO_CALLBACK
    raise RuntimeError(
RuntimeError: The NO_CALLBACK callback has been called. This is a special callback value intended for requests whose callback is never meant to be called.
vfwfrxfs

vfwfrxfs1#

NO_CALLBACK并没有什么神奇的地方。当你向scrapy引擎提交一个请求时,默认情况下,它总是会尝试使用默认回调或使用回调参数指定的回调来处理响应,即使回调是NO_CALLBACK,也是如此。NO_CALLBACK应该做的是充当一个sort标志,以便您可以编写侦听它的自定义中间件,并将其与标准的碎片请求区别对待。
如果我们看一下scrapy.http.requests.NO_CALLBACK的源代码,你会看到:

def NO_CALLBACK(*args, **kwargs):
    """When assigned to the ``callback`` parameter of
    :class:`~scrapy.http.Request`, it indicates that the request is not meant
    to have a spider callback at all.
    For example:
    .. code-block:: python
       Request("https://example.com", callback=NO_CALLBACK)
    This value should be used by :ref:`components <topics-components>` that
    create and handle their own requests, e.g. through
    :meth:`scrapy.core.engine.ExecutionEngine.download`, so that downloader
    middlewares handling such requests can treat them differently from requests
    intended for the :meth:`~scrapy.Spider.parse` callback.
    """
    raise RuntimeError(
        "The NO_CALLBACK callback has been called. This is a special callback "
        "value intended for requests whose callback is never meant to be "
        "called."
    )

如果你在文档中搜索,唯一能找到的使用这个类的例子实际上从来没有产生过请求,而是直接调用了scrapy引擎。
文档示例

request = scrapy.Request(screenshot_url, callback=NO_CALLBACK)
response = await maybe_deferred_to_future(
            spider.crawler.engine.download(request, spider)
        )

相关问题