他们将刮第一页时,移动到第二页,他们显示KeyError: 'driver'
有没有任何解决方案,这些我想创建一个网络爬虫使用scrapy selenium .这些是页面链接https://barreau-montpellier.com/annuaire-professionnel/?cn-s我的代码看起来像这样:
import scrapy
from scrapy import Selector
from scrapy_selenium import SeleniumRequest
class TestSpider(scrapy.Spider):
name = 'test'
page_number=1
def start_requests(self):
yield SeleniumRequest(url='https://barreau-montpellier.com/annuaire-professionnel/?cn-s=',callback=self.parse)
def parse(self, response):
driver=response.meta['driver']
r = Selector(text=driver.page_source)
details=r.xpath("//div[@class='cn-entry cn-background-gradient']")
for detail in details:
email=detail.xpath(".//span[@class='email cn-email-address']//a//@href").get()
try:
email=email.replace("mailto:","")
except:
email=''
n1=detail.xpath(".//span[@class='given-name']//text()").get()
n2=detail.xpath(".//span[@class='family-name']//text()").get()
name=n1+n2
telephone=detail.xpath(".//span[@class='tel cn-phone-number cn-phone-number-type-workphone']//a//text()").get()
fax=detail.xpath(".//span[@class='tel cn-phone-number cn-phone-number-type-workfax']//a//text()").get()
street=detail.xpath(".//span[@class='adr cn-address']//span[@class='street-address notranslate']//text()").get()
locality=detail.xpath(".//span[@class='adr cn-address']//span[@class='locality']//text()").get()
code=detail.xpath(".//span[@class='adr cn-address']//span[@class='postal-code']//text()").get()
address=street+locality+code
yield{
'name':name,
'mail':email,
'telephone':telephone,
'Fax':fax,
'address':address
}
next_page = 'https://barreau-montpellier.com/annuaire-professionnel/pg/'+ str(TestSpider.page_number)+'/?cn-s'
if TestSpider.page_number<=155:
TestSpider.page_number += 1
yield response.follow(next_page, callback = self.parse,)
在setting .py
中,我添加了以下内容:
from shutil import which
SELENIUM_DRIVER_NAME = 'chrome'
SELENIUM_DRIVER_EXECUTABLE_PATH = which('C:\Program Files (x86)\chromedriver.exe')
SELENIUM_DRIVER_ARGUMENTS=['--headless']
DOWNLOADER_MIDDLEWARES = {
'scrapy_selenium.SeleniumMiddleware': 800
}
1条答案
按热度按时间yks3o0rb1#
实际上,你为什么得到
key error driver
?最有可能的是,我清楚地知道它测试后,你的代码不止一次。你有没有测试过你的代码没有分页部分?我也得到了关键错误驱动程序,但当我摆脱分页部分的错误已经消失了。所以,对于不正确的下一页/分页,我已经在def start_requests(self)中使用range函数进行了分页,它工作得很好,没有任何问题,而且这种类型的分页比其他类型快两倍。完整的工作代码:
输出: