scrapy celery 软时限未触发

2izufjch 于 2022-11-23 发布在其他

关注(0)|答案(1)|浏览(186)

我有一个celery 任务，软限制为10，硬限制为32：

from celery.exceptions import SoftTimeLimitExceeded
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings

@app.task(bind=True, acks_late=False, time_limit=32, soft_time_limit=10)
def my_task(self, **kwargs):
    try:
       if 'twisted.internet.reactor' in sys.modules:
            del sys.modules['twisted.internet.reactor']
        settings = get_project_settings()
        process = CrawlerProcess(settings)
        process.crawl(**kwargs)
        process.start()

    except SoftTimeLimitExceeded as te:

        print('Time Exceeded...')

上述代码按预期执行。但是，当爬网花费很长时间并且达到软限制时，不会触发异常。爬网继续，然后达到硬限制并引发以下错误：

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/billiard/pool.py", line 684, in on_hard_timeout
    raise TimeLimitExceeded(job._timeout)
billiard.exceptions.TimeLimitExceeded: TimeLimitExceeded(32,)

我甚至无法从任务内部捕捉到这个错误。作为测试，我用time.sleep(50)替换了process.start()命令，这样就不会启动爬行，但会模拟一个长时间的延迟：

@app.task(bind=True, acks_late=False, time_limit=32, soft_time_limit=10)
def my_task(self, **kwargs):
    try:
       if 'twisted.internet.reactor' in sys.modules:
            del sys.modules['twisted.internet.reactor']
        settings = get_project_settings()
        process = CrawlerProcess(settings)
        process.crawl(**kwargs)
        time.sleep(50)

    except SoftTimeLimitExceeded as te:
        print('Time Exceeded...')

那么SoftTimeLimitExceeded被捕获。为什么会出现这种情况？

个版本

celery ==5.2.7
擦伤==2.6.1

scrapy

来源：https://stackoverflow.com/questions/74289118/celery-soft-time-limit-not-triggered

1条答案

按热度按时间

yks3o0rb1#

同样的问题也发生在我这边。
我认为最有可能的是“SoftTimeLimitExceeded”错误捕获在您的脚本中。所以它不会上升到外面。
您可以检查脚本中是否存在任何预期异常，并将其删除或仅替换为小范围异常。

settings = get_project_settings()
 process = CrawlerProcess(settings)
 process.crawl(**kwargs)

这是我的想法。我试着这样做。如果它在我这边工作，我会继续更新这里。

赞(0）回复(0）举报 2022-11-23

我来回答

scrapy celery 软时限未触发

个版本

1条答案

相关问题

热门标签

最新问答