scrapy celery 软时限未触发

2izufjch  于 2022-11-23  发布在  其他
关注(0)|答案(1)|浏览(173)

我有一个celery 任务,软限制为10,硬限制为32:

from celery.exceptions import SoftTimeLimitExceeded
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings

@app.task(bind=True, acks_late=False, time_limit=32, soft_time_limit=10)
def my_task(self, **kwargs):
    try:
       if 'twisted.internet.reactor' in sys.modules:
            del sys.modules['twisted.internet.reactor']
        settings = get_project_settings()
        process = CrawlerProcess(settings)
        process.crawl(**kwargs)
        process.start()

    except SoftTimeLimitExceeded as te:

        print('Time Exceeded...')

上述代码按预期执行。但是,当爬网花费很长时间并且达到软限制时,不会触发异常。爬网继续,然后达到硬限制并引发以下错误:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/billiard/pool.py", line 684, in on_hard_timeout
    raise TimeLimitExceeded(job._timeout)
billiard.exceptions.TimeLimitExceeded: TimeLimitExceeded(32,)

我甚至无法从任务内部捕捉到这个错误。作为测试,我用time.sleep(50)替换了process.start()命令,这样就不会启动爬行,但会模拟一个长时间的延迟:

@app.task(bind=True, acks_late=False, time_limit=32, soft_time_limit=10)
def my_task(self, **kwargs):
    try:
       if 'twisted.internet.reactor' in sys.modules:
            del sys.modules['twisted.internet.reactor']
        settings = get_project_settings()
        process = CrawlerProcess(settings)
        process.crawl(**kwargs)
        time.sleep(50)

    except SoftTimeLimitExceeded as te:
        print('Time Exceeded...')

那么SoftTimeLimitExceeded被捕获。为什么会出现这种情况?

个版本

celery ==5.2.7
擦伤==2.6.1

yks3o0rb

yks3o0rb1#

同样的问题也发生在我这边。
我认为最有可能的是“SoftTimeLimitExceeded”错误捕获在您的脚本中。所以它不会上升到外面。
您可以检查脚本中是否存在任何预期异常,并将其删除或仅替换为小范围异常。

settings = get_project_settings()
 process = CrawlerProcess(settings)
 process.crawl(**kwargs)

这是我的想法。我试着这样做。如果它在我这边工作,我会继续更新这里。

相关问题