scrapy 如何使scrappy剧作家重试错误

cl25kdpy  于 2023-10-20  发布在  其他
关注(0)|答案(1)|浏览(98)

所以我想写一个使用Scrapy-playwright的爬虫程序。在以前的项目中,我只使用了Scrapy并设置了RETRY_TIMES = 3。即使我无法访问所需的资源,蜘蛛也会尝试发送请求3次,然后才会关闭。
在这里我也试过了,但似乎不起作用。在我得到的第一个错误时,蜘蛛正在关闭。有人能帮帮我吗我应该怎么做才能让spider尝试请求我需要多少次url?
以下是我的settings.py的一些例子:

RETRY_ENABLED = True
RETRY_TIMES = 3
DOWNLOAD_TIMEOUT = 60
DOWNLOAD_DELAY = random.uniform(0, 1)

DOWNLOAD_HANDLERS = {
    "http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
    "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
}

TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"

提前感谢!

6jygbczu

6jygbczu1#

确保捕获并记录Playwright脚本中的异常。这将帮助您确定Playwright脚本本身是否遇到了触发蜘蛛关闭的错误。

RETRY_ENABLED = True
RETRY_TIMES = 3  
DOWNLOAD_TIMEOUT = 60 
DOWNLOAD_DELAY = random.uniform(0, 1)  # Introduce a random delay between requests

DOWNLOAD_HANDLERS = {
    "http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
    "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
}

TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"

您已经将DOWNLOAD_TIMEOUT设置为60秒,这是一个相对较长的时间。确保对于您正在进行的请求类型,超时时间不会太短。如果请求需要很长时间才能响应,这可能会影响重试行为。

相关问题