Scrapy/Selenium跳转到下一页两次，然后中断(元素不可点击),另外，除了第一页之外，不提取任何数据

jexiocij 于 2023-06-06 发布在其他

关注(0)|答案(1)|浏览(241)

我正试图刮一个网站，我需要的程序跳转到下一页，直到结束，并提取每个网站上的数据。到目前为止，从起始页提取数据的工作，也自动跳转到第2页和第3页。
1.在前两次跳转后，程序停止，因为 element click被拦截：元素...在点（904，603）处不可点击。其他元素将收到click。我不明白，因为前2次点击工作正常。
1.程序只在第一页提取所需的数据（回调），而不在后面的页提取。知道为什么吗

def parse(self, response):
 url = 'xxxx'
 self.driver = webdriver.Chrome('/Users/xxxx/chromedriver')
 self.driver.maximize_window() # For maximizing window
 self.driver.implicitly_wait(10) # gives an implicit wait for 10 seconds
 self.driver.get(url)

 while self.driver.find_elements_by_css_selector("body > div.container-fluid.main-container.bg-white.py-5 > section.maincontent.row > div > nav:nth-child(11) > ul > li:nth-child(7) > a"):

     sel = Selector(text=self.driver.page_source)

     single_joboffer = response.xpath(".//div[@class='col-12 col-md-10']/p[@class='inserattitel h2 mt-0']/a/@href")

     for joboffer in single_joboffer:
         url1 = response.urljoin(joboffer.extract())
         yield scrapy.Request(url1, callback = self.parse_dir_contents)

     element = self.driver.find_element_by_css_selector("body > div.container-fluid.main-container.bg-white.py-5 > section.maincontent.row > div > nav:nth-child(11) > ul > li:nth-child(7) > a")
     self.driver.execute_script("window.scrollBy(0,4000)","", element)
     sel = Selector(text=self.driver.page_source)
     sleep(5)
     self.driver.find_element_by_css_selector("body > div.container-fluid.main-container.bg-white.py-5 > section.maincontent.row > div > nav:nth-child(11) > ul > li:nth-child(7) > a").click()
 self.driver.close()

我第一次尝试对next按钮使用xpath，但没有成功，因为几个页面按钮使用相同的xpath，所以它会在页面之间随机跳转。css选择器似乎是这里要走的路。还有我玩了睡眠时间，但似乎对此没有影响。

scrapy

来源：https://stackoverflow.com/questions/66852455/scrapy-selenium-jumps-twice-to-the-next-page-then-breaks-element-not-clickable

1条答案

按热度按时间

voj3qocg1#

1.问题修复：在前两页之后，css结构发生了变化。从前两页的结构来看：[”|<<"，“上一页”，“1”，“2”，“3”，"..."，“下一页”，">>|“]已更改为[”|<<"，“上一页”，“1”，“2”，“3”，“4”，"..."，“下一页”，">>|”]从第三页开始。我不得不将nth-child（）改为nth-last-child（2），以始终获得“next”按钮。

赞(0）回复(0）举报 2023-06-06

我来回答

Scrapy/Selenium跳转到下一页两次，然后中断(元素不可点击),另外，除了第一页之外，不提取任何数据

1条答案

相关问题

热门标签

最新问答