Scrapy使用私有代理

yrefmtwq  于 2022-12-13  发布在  其他
关注(0)|答案(1)|浏览(107)

我使用自定义配置的虚拟机作为代理服务器(通过squid),现在我尝试使用它为我的刮刀。我正在使用scrapy-rotating-proxies旋转troight我的ip列表定义,但问题是,我的代理被视为死亡的权利,在第一次尝试,即使我已经验证了代理地址是活的,并正在工作刚刚好(我在firefox中设置了一个代理,尝试浏览httphttps两个网页。
零碎的设置

DOWNLOADER_MIDDLEWARES = {
    "scrapy.downloadermiddlewares.useragent.UserAgentMiddleware": None,
    "scrapy.downloadermiddlewares.retry.RetryMiddleware": None,
    "scrapy_fake_useragent.middleware.RandomUserAgentMiddleware": 400,
    "scrapy_fake_useragent.middleware.RetryUserAgentMiddleware": 401,
    "rotating_proxies.middlewares.RotatingProxyMiddleware": 610,
    "rotating_proxies.middlewares.BanDetectionMiddleware": 620,
}

ROTATING_PROXY_LIST = ["X.X.X.X:3128"]

碎原木

2022-12-02 13:31:22 [scrapy.core.engine] INFO: Spider opened
2022-12-02 13:31:22 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-12-02 13:31:22 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2022-12-02 13:31:22 [rotating_proxies.middlewares] INFO: Proxies(good: 0, dead: 0, unchecked: 1, reanimated: 0, mean backoff time: 0s)
2022-12-02 13:31:32 [rotating_proxies.expire] DEBUG: Proxy <http://X.X.X.X:3128> is DEAD
2022-12-02 13:31:32 [rotating_proxies.middlewares] DEBUG: Retrying <GET https://www.johnlewis.com/header/api/config> with another proxy (failed 1 times, max retries: 5)
2022-12-02 13:31:32 [rotating_proxies.middlewares] WARNING: No proxies available; marking all proxies as unchecked

我为squid更改了设置

http_access allow all
via off
forwarded_for delete

请告知可能的问题

omvjsjqw

omvjsjqw1#

"scrapy.downloadermiddlewares.useragent.UserAgentMiddleware": None,
   "scrapy.downloadermiddlewares.retry.RetryMiddleware": None,

这些中间件是问题所在,我无法解释为什么在启用这些中间件的情况下,scrapy能够在没有代理的情况下处理我的请求,但在禁用这些中间件后,我能够使用我的代理运行scrapy。

相关问题