如何修复Scrapy-Selenium不产生输出?

g9icjywg  于 2022-11-09  发布在  其他
关注(0)|答案(1)|浏览(140)

Selenium请求工作,但不使用scrapy-selenium。页面加载,我从网站得到一个200的响应,但我没有得到错误,因为它没有产生任何输出。

class SeamdbTestSpider(scrapy.Spider):
    name = 'steam_db_test'
    start_urls = ['https://steamdb.info/graph/']

    def start_requests(self):

        for link in self.start_urls:
            yield SeleniumRequest(
                url=link, 
                wait_time= 10,        
                callback=self.parse)

    def parse(self, response):
        driver = response.meta['driver']
        initial_page = driver.page_source
        r = Selector(text=initial_page)
        table = r.xpath('//*[@id="table-apps"]/tbody')
        rows = table.css('tr[class= "app"]')[0:2]

        for element in rows:
            info_link = "https://steamdb.info" + element.css('::attr(href)').get()
            name = element.css('a ::text').get()
            yield {"Name": name, "Link": info_link}
kgqe7b3p

kgqe7b3p1#

实际上,SeleniumRequest与Scrappy并不总是完美的。同样的selement选择是Workking selenium 与bs 4,但得到空输出像你沿着与Scrappy。
Scrapy-SeleniumRequest不工作

import scrapy
from scrapy import Selector
from scrapy_selenium import SeleniumRequest

class SeamdbTestSpider(scrapy.Spider):
    name = 'steam_db_test'
    start_urls = ['https://steamdb.info/graph/']

    def start_requests(self):

        for link in self.start_urls:
            yield SeleniumRequest(
                url=link, 
                wait_time= 10,        
                callback=self.parse)

    def parse(self, response):
        driver = response.meta['driver']
        initial_page = driver.page_source
        r = Selector(text=initial_page)
        rows = r.css('table#table-apps tbody tr')

        for element in rows:
            info_link = "https://steamdb.info" + element.css('td:nth-child(3) > a::attr(href)').get()
            name = element.css('td:nth-child(3) > a::text').get()
            yield {"Name": name, "Link": info_link}

selenium 与bs 4是工作正常:

import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup

options = webdriver.ChromeOptions()
options.add_argument("start-maximized")

# chrome to stay open

options.add_experimental_option("detach", True)

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()),options=options)

driver.get("https://steamdb.info/graph/")
time.sleep(5)

soup = BeautifulSoup(driver.page_source, 'lxml')

for tr in soup.select('table#table-apps tbody tr'):
    link=tr.select_one('td:nth-child(3) > a').get('href')
    link="https://steamdb.info" +  link
    name = tr.select_one('td:nth-child(3) > a').text
    print(link)
    print(name)

输出:

https://steamdb.info/app/730/graphs/
Counter-Strike: Global Offensive
https://steamdb.info/app/570/graphs/
Dota 2
https://steamdb.info/app/578080/graphs/
PUBG: BATTLEGROUNDS
https://steamdb.info/app/1172470/graphs/
Apex Legends
https://steamdb.info/app/1599340/graphs/
Lost Ark
https://steamdb.info/app/271590/graphs/
Grand Theft Auto V
https://steamdb.info/app/440/graphs/
Team Fortress 2
https://steamdb.info/app/1446780/graphs/
MONSTER HUNTER RISE
https://steamdb.info/app/346110/graphs/
ARK: Survival Evolved
https://steamdb.info/app/252490/graphs/
Rust
https://steamdb.info/app/431960/graphs/
Wallpaper Engine
https://steamdb.info/app/1506830/graphs/
FIFA 22
https://steamdb.info/app/1085660/graphs/
Destiny 2
https://steamdb.info/app/1569040/graphs/
Football Manager 2022
https://steamdb.info/app/230410/graphs/
Warframe
https://steamdb.info/app/1203220/graphs/
NARAKA: BLADEPOINT
https://steamdb.info/app/359550/graphs/
Tom Clancy's Rainbow Six Siege
https://steamdb.info/app/381210/graphs/
Dead by Daylight
https://steamdb.info/app/236390/graphs/

...等等

相关问题