scrapy 安装的Reactor与请求的Reactor不匹配

xdyibdwo  于 2022-11-09  发布在  React
关注(0)|答案(1)|浏览(402)

我试着在scrape-playwrights文档中运行www.example.com上的滚动示例quotes.toscrape.com/scroll但由于React堆的问题,我甚至无法运行scrape-playwrights文档:

URL SPIDER TEST

***********************

SCRAPE STARTED

***********************

2022-08-11 15:47:38 [scrapy.crawler] INFO: Overridden settings: {'TWISTED_REACTOR': 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'} crawled: <Deferred at 0x11ef17e50 current result: <twisted.python.failure.Failure builtins.Exception: The installed reactor (twisted.internet.selectreactor.SelectReactor) does not match the requested one (twisted.internet.asyncioreactor.AsyncioSelectorReactor)>>

代码为:

import csv
import json
import pygsheets
import scrapy
from scrapy_playwright.page import PageMethod
import json
from scrapy.utils.log import configure_logging
from scrapy.utils.project import get_project_settings
from scrapy.utils.reactor import install_reactor
from scrapy.crawler import CrawlerProcess
from scrapy.crawler import CrawlerRunner
import datetime as dt
from datetime import date
from twisted.internet import reactor, defer
import tempfile

def breaker(comment):
    print('***********************')
    print(comment)
    print('***********************')

class UrlSpider(scrapy.Spider):
    name = "Url"

    custom_settings={
        'DOWNLOAD_HANDLERS':{
            "http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
            "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
        },
        "TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor",
    }

    def start_requests(self):
        yield scrapy.Request(
            url='http://quotes.toscrape.com/scroll',
            meta=dict(
                playwright=True,
                playwright_include_page=True,
                playwright_page_methods=[
                    PageMethod('wait_for_selector','div.quote'),
                    PageMethod('evaluate','window.scrollBy(0, document.body.scrollHeight)'),
                    PageMethod('wait_for_selector','div.quote:nth-child(11)'),
                ],
            ),
        )
async def parse(self, response):
    page=response.meta['playwright_page']
    await page.screenshot(path='quotes.png',full_page=True)
    await page.close()
    return {'quotes_count':len(response.css('div.quote'))}

print('URL SPIDER TEST')

configure_logging()
settings=get_project_settings()
runner = CrawlerRunner(settings)

@defer.inlineCallbacks
def crawl():
    breaker('SCRAPE STARTED')
    bug=runner.crawl(UrlSpider)
    reactor.close()
    yield bug

url_list=crawl()
print('crawled: '+str(url_list))
reactor.run()

我已经尝试了几个小时来寻找解决方案,但没有成功,我使用CrawlerRunner是因为我想在某个时候自动化代码,但即使使用CrawlerProcess,我也会得到错误。
我还使用自定义设置,因为我遇到了项目设置没有使用get_project_settings添加的问题,自定义设置让我确保它被使用。
如果我在自定义设置中删除扭曲React器的设置,蜘蛛会报废并产出,但React器错误再次发生,它不会检索任何东西。

kuuvgm7e

kuuvgm7e1#

在设置中,查找并注解:异步选择器React器= '2.7'
这是我最后的台词

相关问题