我正试图爬这个网页的产品的细节https://www.goo-net.com/php/search/summary.php由刮 selenium 。
因为我想抓取每个产品的详细信息,所以我从页面上抓取了所有产品的url,然后我用回调方法将其解析成另一个def来抓取该url的所有信息。
但我尝试了很多解决方案,但我的输出总是没有显示任何东西
这是我的代码
import scrapy
import selenium
from scrapy_selenium import SeleniumRequest
from selenium.webdriver.common.keys import Keys
class Goonet1Spider(scrapy.Spider):
name = 'goonet1'
def start_requests(self):
yield SeleniumRequest (
url='https://www.goo-net.com/php/search/summary.php',
wait_time=4,
callback=self.parse
)
def parse(self, response):
links = response.xpath("//*[@class='heading_inner']/h3/a")
url_detail = []
for link in links:
url = response.urljoin(link.xpath(".//@href").get())
url_detail.append(url)
for i in url_detail:
yield SeleniumRequest (
url=i,
wait_time=4,
callback=self.parse_item
)
def parse_item(self,response):
base_price = response.xpath("//table[@class='mainData']/tbody/tr[2]/td[1]/span/text()").get()
yield {
'base_price': base_price
}
这里是我的settings.py
DOWNLOADER_MIDDLEWARES = {
'scrapy_selenium.SeleniumMiddleware': 800
}
# SELENIUM
from shutil import which
SELENIUM_DRIVER_NAME = 'chrome'
SELENIUM_DRIVER_EXECUTABLE_PATH = which('chromedriver')
SELENIUM_DRIVER_ARGUMENTS=['-headless'] # '--headless' if using chrome instead of firefox
请帮帮我
1条答案
按热度按时间pbpqsu0x1#
将BaseURL添加到url_detail以完成链接: