scrapy无法刮取数据

cgh8pdjw  于 2021-08-20  发布在  Java
关注(0)|答案(0)|浏览(214)


我正试图从房地产网站上搜集数据https://www.spitogatos.gr/. 我从robots.txt上看到的是:终极robots.txt机器人和用户代理拦截器我只想每天刮一次网站,这是用scrapy刮的一种方式吗?先谢谢你

import scrapy
    class MainprojectSpider(scrapy.Spider):
    name = 'mainProject'
    allowed_domains = ['www.spitogatos.gr']
    user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like 
    Gecko) Chrome/91.0.4472.124 Safari/537.36'
   #start_urls = ['https://www.spitogatos.gr/']

     def start_requests(self):
        yield scrapy.Request(url='https://www.spitogatos.gr', callback = self.parse,
        headers= {'User Agent':self.user_agent})        
     def parse(self, response):
        print(response.xpath('//h2[@class="text thin h1"]/text()').extract())#just dummy
     def set_user_agent(self, request):
        request.headers['User-Agent'] = self.user_agent
        return request

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题