scrapy 如何在蜘蛛关闭前存储所有刮下的统计信息?

a5g8bdjr  于 2022-11-09  发布在  其他
关注(0)|答案(1)|浏览(152)

我想把从spider收集到的所有统计信息存储到一个以json格式存储的输出文件中。但是,我得到了这个错误:
'MemoryStatsCollector'对象没有属性'get_all'
:文档中提到stats.get_all是获取所有存储的方法。正确的实现方法是什么?

import scrapy
from scrapy import signals
from scrapy import crawler
import jsonlines

class TestSpider(scrapy.Spider):
    name = 'stats'

    start_urls = ['http://quotes.toscrape.com']

    def __init__(self, stats):
       self.stats = stats

    @classmethod
    def from_crawler(cls, crawler, *args,**kwargs):
        #spider = super(TestSpider, cls).from_crawler(crawler, *args,**kwargs)
        stat = cls(crawler.stats)
        crawler.signals.connect(stat.spider_closed, signals.spider_closed)
        return stat

    def spider_closed(self):
        #self.stats = stat
        txt_file = 'some_text.jl'
        with jsonlines.open(txt_file, 'w') as f:
            f.write(self.stats.get_all())

    def start_requests(self):
        for url in self.start_urls:
            yield scrapy.Request(
                url=url,
                callback=self.parse
            )
    def parse(self, response):
        content = response.xpath('//div[@class = "row"]')
        for items in content:
            yield {
                'some_items_links':items.xpath(".//a//@href").get()
            }
fkaflof6

fkaflof61#

结果发现该方法没有get_all,而我不得不输入get_stats(),文档提供了一些示例:

  • 获取值()
  • 获取统计数据
  • 统计数据最大值()/统计数据最小值()
  • stats.inc()
  • 统计设置值()

stats的文档中提供了一些进一步的信息。
工作部分:

def spider_closed(self):
        #self.stats = stat
        txt_file = 'some_text.jl'
        with jsonlines.open(txt_file, 'w') as f:
            # f.write(f'{self.stats.get_all()}') --- Changed
            f.write(f'{self.stats.get_stats()}')

输出量:

{
    "log_count/INFO": 10,
    "log_count/DEBUG": 3,
    "start_time": datetime.datetime(2022, 7, 6, 16, 16, 30, 553373),
    "memusage/startup": 59895808,
    "memusage/max": 59895808,
    "scheduler/enqueued/memory": 1,
    "scheduler/enqueued": 1,
    "scheduler/dequeued/memory": 1,
    "scheduler/dequeued": 1,
    "downloader/request_count": 1,
    "downloader/request_method_count/GET": 1,
    "downloader/request_bytes": 223,
    "downloader/response_count": 1,
    "downloader/response_status_count/200": 1,
    "downloader/response_bytes": 2086,
    "httpcompression/response_bytes": 11053,
    "httpcompression/response_count": 1,
    "response_received_count": 1,
    "item_scraped_count": 1,
    "elapsed_time_seconds": 0.34008,
    "finish_time": datetime.datetime(2022, 7, 6, 16, 16, 30, 893453),
    "finish_reason": "finished",
}

相关问题