scrapy 扭曲.互联网.错误.React器已安装错误：React器已安装

ca1c2owp 于 2022-11-09 发布在 React

关注(0)|答案(4)|浏览(129)

我有这个错误，当我运行一个抓取进程多次。我使用的是scrapy 2.6这是我的代码：

from scrapy.crawler import CrawlerProcess
from football.spiders.laliga import LaligaSpider
from scrapy.utils.project import get_project_settings

process = CrawlerProcess(settings=get_project_settings())
for i in range(1, 29):
    process.crawl(LaligaSpider,**{'week': i})
process.start()

scrapy

来源：https://stackoverflow.com/questions/71548957/twisted-internet-error-reactoralreadyinstallederror-reactor-already-installed

4条答案

按热度按时间

u4dcyp6a1#

对我来说，这是有效的，我把它放在CrawlerProcess之前

import sys    
if "twisted.internet.reactor" in sys.modules:
    del sys.modules["twisted.internet.reactor"]

赞(0）回复(0）举报 2022-11-09

p1tboqfb2#

此解决方案避免了使用CrawlerProcess，如文档www.example.com中所述https://docs.scrapy.org/en/latest/topics/practices.html#run-scrapy-from-a-script
还有另一个Scrapy实用程序，它提供了对爬行过程的更多控制：scrapy.crawler.CrawlerRunner.这个类是一个简单的 Package 器，它封装了一些简单的帮助器来运行多个爬虫程序，但是它不会以任何方式启动或干扰现有的React器。
如果您的应用程序已经在使用Twisted，并且您希望在同一个React器中运行Scrapy，则建议您使用CrawlerRunner而不是CrawlerProcess。

from twisted.internet import reactor
from scrapy.crawler import CrawlerRunner
from scrapy.utils.project import get_project_settings
from scrapy.utils.log import configure_logging

from football.spiders.laliga import LaligaSpider

# Enable logging for CrawlerRunner

configure_logging()

runner = CrawlerRunner(settings=get_project_settings())
for i in range(1, 29):
    runner.crawl(LaligaSpider,**{'week': i})

deferred = runner.join()
deferred.addBoth(lambda _: reactor.stop())

reactor.run()  # the script will block here until all crawling jobs are finished

赞(0）回复(0）举报 2022-11-09

h6my8fg23#

我也遇到了这个问题。看起来https://docs.scrapy.org/en/latest/topics/practices.html的文档中关于CrawlerProcess可以用来运行多个用spider构建的爬虫的说法是不正确的，因为如果你给予它一个spider，每个新的爬虫都会尝试加载一个新的reactor示例。我可以通过使用CrawlerRunner来让我的代码工作，这在同一页上也有详细说明。

import scrapy
from twisted.internet import reactor
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
from scrapy.utils.project import get_project_settings

class MySpider1(scrapy.Spider):
    # Your first spider definition
    ...

class MySpider2(scrapy.Spider):
    # Your second spider definition
    ...

configure_logging()
settings = get_project_settings() # settings not required if running
runner = CrawlerRunner(settings)  # from script, defaults provided
runner.crawl(MySpider1) # your loop would go here
runner.crawl(MySpider2)
d = runner.join()
d.addBoth(lambda _: reactor.stop())
reactor.run() # the script will block here until all crawling jobs are finished

赞(0）回复(0）举报 2022-11-09

llycmphe4#

我曾经遇到过这个问题，但是在更新了Scrapy和Twisted之后，这个问题就解决了。当前版本的软件包。Twisted==22.8.0 Scrapy==2.6.2

赞(0）回复(0）举报 2022-11-09