我面临的错误HTTP status code is not handled or not allowed
如何解决这些错误我正在使用 selenium 和scrapy在一起我也使用的user agent
在设置,但HTTP错误不会解决请推荐任何解决方案这是页面链接https://www.askgamblers.com/online-casinos/countries/uk
import scrapy
from scrapy.http import Request
from selenium import webdriver
import time
from scrapy_selenium import SeleniumRequest
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager
class TestSpider(scrapy.Spider):
name = 'test'
def start_requests(self):
options = webdriver.ChromeOptions()
options.add_argument("--no-sandbox")
options.add_argument("--disable-gpu")
options.add_argument("--window-size=1920x1080")
options.add_argument("--disable-extensions")
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
URL = 'https://www.askgamblers.com/online-casinos/countries/uk'
driver.get(URL)
time.sleep(3)
page_links =driver.find_elements(By.XPATH, "//div[@class='card__desc']//a[starts-with(@href, '/online')]")
for link in page_links:
href=link.get_attribute("href")
yield scrapy.Request(href)
driver.quit()
def parse(self, response):
title=response.css(By.CSS_SELECTOR, "h1.ch-title::text").get()
yield{
'title':title
}
1条答案
按热度按时间xoefb8l81#
你得到这样的错误,因为该网站是在cloudflare保护。
Scrapy with Selenium/Scrapy不能处理(我测试过)cloudflare保护,但只有
powerful selenium engine
可以完成这项工作。最后,我将bs 4与selenium集成,以更健壮的方式解析内容。脚本:
输出: