是一种不用docker就可以使用scrappy splash的方法。我的意思是,我有一个运行python3的服务器,没有安装docker。如果可能的话,我不想在上面安装docker。
还有,SPLASH_URL到底是什么?我可以只使用我的服务器的IP吗?
我已经试过了
def start_requests(self):
url = ["europages.fr/entreprises/France/pg-20/resultats.html?ih=01510;01505;01515;01525;01530;01570;01565;01750;01590;01595;01575;01900;01920;01520;01905;01585;01685;01526;01607;01532;01580;01915;02731;01700;01600;01597;01910;01906"]
print(url)
yield SplashRequest(url = 'https://' + url[0], callback = self.parse_all_links,
args={
# optional; parameters passed to Splash HTTP API
'wait': 0.5,
# 'url' is prefilled from request url
# 'http_method' is set to 'POST' for POST requests
# 'body' is set to request body for POST requests
} # optional; default is render.html
) ## TO DO : Changer la callback
通过setting.py
SPIDER_MIDDLEWARES = {
'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}
DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'
HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage'
# Enable or disable downloader middlewares
# See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html
DOWNLOADER_MIDDLEWARES = {
#'Europages.middlewares.EuropagesDownloaderMiddleware': 543,
'scrapy_splash.SplashCookiesMiddleware': 723,
'scrapy_splash.SplashMiddleware': 725,
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
}
以及
SPLASH_URL =“我的服务器的URL”
我希望我的帖子很清楚。
谢谢
1条答案
按热度按时间vi4fp9gy1#
看起来它曾经在以前版本的Splash中是可能的,但现在不是了(https://splash.readthedocs.io/en/3.3.1/install.html)