下面的脚本最后抛出错误 “注意!|Cloudflare” 当我尝试使用response.css('title::text').get()
作为测试来获取数据时。
碎蜘蛛:
import scrapy
class DataSpider(scrapy.Spider):
name = "avvo"
def start_requests(self):
urls = [
'https://www.avvo.com/attorneys/84025-ut-jason-hunter-284784/reviews.html',
]
for url in urls:
yield scrapy.Request(url=url, callback=self.parse)
def __init__(self):
self.called = False
self.data = {}
def parse(self, response):
if not self.called:
self.called = True
self.data["website"] = response.css('title::text').get()
yield self.data
结果:
'Attention Required! | Cloudflare'
1条答案
按热度按时间y1aodyip1#
你可以使用 selenium 来绕过云耀斑。