我一直在Scrapy上使用Crawlera,效果很好。但是,我在Crawlera控制面板中更改了API密钥,从那以后我就无法使用Crawlera。我联系了他们的客户支持,他们说API密钥工作正常。我决定尝试使用Crawlera来使用Scrapy文档中的示例。没有成功。Scrapy正在向“dmoz.org“而不是paygo.com发出请求。我安装了刮擦器和刮擦器。
以下是日志:
[scrapy] INFO: Using crawlera at http://paygo.crawlera.com:8010?noconnect (user: [my_api_key])
2015-08-10 19:16:24 [scrapy] DEBUG: Telnet console listening on [my_ip_address]
2015-08-10 19:16:26 [scrapy] DEBUG: Crawled (200) <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Books/> (referer: None)
2015-08-10 19:16:26 [scrapy] INFO: Closing spider (finished)
2015-08-10 19:16:26 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 660,
'downloader/request_count': 2,
'downloader/request_method_count/GET': 2,
'downloader/response_bytes': 16445,
'downloader/response_count': 2,
'downloader/response_status_count/200': 2,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2015, 8, 11, 2, 16, 26, 990760),
'log_count/DEBUG': 3,
'log_count/INFO': 8,
'log_count/WARNING': 2,
'response_received_count': 2,
'scheduler/dequeued': 2,
'scheduler/dequeued/memory': 2,
'scheduler/enqueued': 2,
'scheduler/enqueued/memory': 2,
'start_time': datetime.datetime(2015, 8, 11, 2, 16, 24, 720987)}
2015-08-10 19:16:26 [scrapy] INFO: Spider closed (finished)
任何帮助或想法为什么会发生这种情况将不胜感激。
# settings file
BOT_NAME = 'tutorial'
SPIDER_MODULES = ['tutorial.spiders']
NEWSPIDER_MODULE = 'tutorial.spiders'
DOWNLOADER_MIDDLEWARES = {'scrapy_crawlera.CrawleraMiddleware': 600}
CRAWLERA_ENABLED = True
CRAWLERA_USER = '[my_api_key]'
CRAWLERA_PASS = ''
CRAWLERA_PRESERVE_DELAY = True
CONCURRENT_REQUESTS = 32
CONCURRENT_REQUESTS_PER_DOMAIN = 32
AUTOTHROTTLE_ENABLED = False
DOWNLOAD_TIMEOUT = 600
# items file
import scrapy
class DmozItem(scrapy.Item):
title = scrapy.Field()
link = scrapy.Field()
desc = scrapy.Field()
# spider file
import scrapy
class DmozSpider(scrapy.Spider):
name = "dmoz"
allowed_domains = ["dmoz.org"]
start_urls = [
"http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
"http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
]
def parse(self, response):
filename = response.url.split("/")[-2] + '.html'
with open(filename, 'wb') as f:
f.write(response.body)
1条答案
按热度按时间wbgh16ku1#
在setting.py文件中,您需要配置“下载器_MIDDLEWARES”。
例如: