我试图刮一个网站的实践,但我一直得到HTTP 403错误。如何获得请求权限?
下面是我的代码:
from typing import List
import scrapy
class ResearchSpider(scrapy.Spider):
name = 'pesquisa'
start_urls: list[str] = ['https://www.imovelweb.com.br/imoveis-aluguel-paraiba.html?iv_=__iv_p_1_a_17808488596_g_139189246037_w_dsa-1687663569069_h_20089_ii_20098_d_c_v__n_g_c_611609016411_k__m__l__t__e__r__vi__']
def parse(self, response):
for pesquisa in response.css('.js-listing-labels-link'):
yield{
'address': pesquisa.css('.property-card__address::text').get(),
'area': pesquisa.css('.js-property-card-detail-area::text').get(),
'rooms': pesquisa.css('.js-property-detail-rooms .js-property-card-value::text').get(),
'bathroom': pesquisa.css('.js-property-detail-bathroom .js-property-card-value::text').get(),
'garages': pesquisa.css('.js-property-detail-garages .js-property-card-value::text').get(),
'prices': pesquisa.css('p::text').get()[5:-1]}
终端要执行命令:
scrapy shell
fetch('https://www.imovelweb.com.br/imoveis-paraiba.html')
我得到的错误是:
2022-09-16 14:11:14 [filelock] DEBUG: Attempting to release lock 1292395054608 on C:\Users\Familia\anaconda3\lib\site-packages\tldextract\.suffix_cache/pub
licsuffix.org-tlds\de84b5ca2167d4c83e38fb162f2e8738.tldextract.json.lock
2022-09-16 14:11:14 [filelock] DEBUG: Lock 1292395054608 released on C:\Users\Familia\anaconda3\lib\site-packages\tldextract\.suffix_cache/publicsuffix.org
-tlds\de84b5ca2167d4c83e38fb162f2e8738.tldextract.json.lock
2022-09-16 14:11:16 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.imovelweb.com.br/imoveis-paraiba.html> (referer: None)
1条答案
按热度按时间uwopmtnx1#
该 网站 受 cloudflare 保护 。
中 的 每 一 个