KeyError：当我在主文件中运行蜘蛛程序时，'driver' scrapy和selenium在一起

prdp8dxp 于 2022-11-09 发布在其他

关注(0)|答案(1)|浏览(138)

我的Python脚本有一个问题。当我运行spider scrapy runspider Myspider时，它可以工作，但如果我从主文件运行它，我会出现以下错误：KeyError: 'driver'个
设置文件：

SELENIUM_DRIVER_NAME = 'chrome'

# SELENIUM_DRIVER_EXECUTABLE_PATH = '/home/PATH/OF/FILE/chromedriver'

SELENIUM_DRIVER_ARGUMENTS=['--headless']

DOWNLOADER_MIDDLEWARES = {
    'scrapy_selenium.SeleniumMiddleware': 800
}

我的spider文件：

class MySpider(scrapy.Spider):
    name = 'my_spider'

    def __init__(self, list_urls, *args,**kwargs):
        super(my_spider, self).__init__(*args,**kwargs)
        self.urls = list_urls

    def start_requests(self):
        for url in self.urls:
            yield SeleniumRequest(
                url = url['link'],
                callback = self.parse,
                wait_time = 15,
            )

和我主文件：

import scrapy
import classListUrls
from scrapy.crawler import CrawlerProcess
from dir.spiders import Spider

URL = "example.com"
urls = classListUrls.GenListUrls(URL)

process = CrawlerProcess()
process.crawl(Spider.my_spider, list_urls = urls.list_urls())
process.start()

我不明白为什么会出现这种错误。

scrapy

来源：https://stackoverflow.com/questions/73754672/keyerror-driver-scrapy-and-selenium-together-when-i-run-my-spider-in-main-fil

1条答案

按热度按时间

mcvgt66p1#

我看到的一个问题是，process.crawl的第一个参数应该是spider类，而不是spider名称。

process.crawl(Spider.MySpider, list_urls=urls.list_urls())

在spider __init__中调用超类时也是如此，尽管更好的选择是将其保留为空，因为该类已经是默认的。

class MySpider(scrapy.Spider):
    def __init__(self, *args, list_urls=None,**kwargs):
        super().__init__(*args,**kwargs)

另一件事是，crawlerProcess需要用一个设置对象来构造，因为它不从主www.example.com文件中读取settings.py。

process = CrawlerProcess(settings={"SELENIUM_DRIVER_NAME": 'chrome',
                                   "SELENIUM_DRIVER_ARGUMENTS": ['--headless'],
                                   "DOWNLOADER_MIDDLEWARES": {'scrapy_selenium.SeleniumMiddleware': 800}})

赞(0）回复(0）举报 2022-11-09

我来回答

KeyError：当我在主文件中运行蜘蛛程序时，'driver' scrapy和selenium在一起

1条答案

相关问题

热门标签

最新问答