使用Scrapy时被阻止(使用用户代理)

voj3qocg  于 12个月前  发布在  其他
关注(0)|答案(1)|浏览(133)

我正试图在我的国家刮一个休闲运动队的网站,不断阻止我的Scrapy尝试。我试过设置一个用户代理,但没有任何成功。当我运行Scrapy时,我得到429未知状态。没有200成功。我可以在我的浏览器中访问该网站,所以我知道我的IP没有被阻止。如果你能帮忙的话,我将不胜感激。
下面是我正在使用的代码:

import scrapy
from scrapy.spiders import Rule, CrawlSpider
from scrapy.linkextractors import LinkExtractor

class QuoteSpider(CrawlSpider):
    name = "Quote"
    allowed_domains = ["avaldsnes.spoortz.no"]
    start_urls = ["https://avaldsnes.spoortz.no/portal/arego/club/7"]

    rules = (Rule(LinkExtractor(allow="")),)
    custom_settings = {"USER_AGENT": "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"}

    def parse(self, response):
        print(response.request.headers)

我试着抓取网站的链接,但没有一个尝试成功。现在用户代理被设置为谷歌机器人,但我也尝试过常规的。

9rbhqvlz

9rbhqvlz1#

在这种情况下,您需要设置头文件(而不仅仅是用户代理)。

from scrapy.spiders import Rule, CrawlSpider
from scrapy.linkextractors import LinkExtractor

class QuoteSpider(CrawlSpider):
    name = "Quote"
    allowed_domains = ["avaldsnes.spoortz.no"]
    start_urls = ["https://avaldsnes.spoortz.no/portal/arego/club/7"]

    headers = {
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
        "Accept-Encoding": "gzip, deflate, br",
        "Accept-Language": "en-US,en;q=0.5",
        "Cache-Control": "no-cache",
        "Connection": "keep-alive",
        "DNT": "1",
        "Host": "avaldsnes.spoortz.no",
        "Pragma": "no-cache",
        "Sec-Fetch-Dest": "document",
        "Sec-Fetch-Mode": "navigate",
        "Sec-Fetch-Site": "none",
        "Sec-Fetch-User": "?1",
        "Upgrade-Insecure-Requests": "1",
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
    }

    rules = (Rule(LinkExtractor(allow="")),)

    custom_settings = {
        'DEFAULT_REQUEST_HEADERS': headers
    }

    def parse(self, response):
        print(response.request.headers)

输出:

[scrapy.core.engine] DEBUG: Crawled (200) <GET https://avaldsnes.spoortz.no/portal/arego/club/7> (referer: None)
...
...
...

相关问题