Scrapy -使用spider名称同时记录到文件和标准输出

0tdrvxhp 于 2022-11-09 发布在其他

关注(0)|答案(8)|浏览(142)

我决定使用Python日志模块，因为Twisted on std error生成的消息太长了，我想将INFO级别的有意义的消息（如StatsCollector生成的消息）写入一个单独的日志文件，同时保留屏幕上的消息。

from twisted.python import log
     import logging
     logging.basicConfig(level=logging.INFO, filemode='w', filename='buyerlog.txt')
     observer = log.PythonLoggingObserver()
     observer.start()

这很好，我有我的消息，但缺点是我不知道消息是由哪个spider生成的！这是我的日志文件，%(name)s显示“twisted”：

INFO:twisted:Log opened.
  2 INFO:twisted:Scrapy 0.12.0.2543 started (bot: property)
  3 INFO:twisted:scrapy.telnet.TelnetConsole starting on 6023
  4 INFO:twisted:scrapy.webservice.WebService starting on 6080
  5 INFO:twisted:Spider opened
  6 INFO:twisted:Spider opened
  7 INFO:twisted:Received SIGINT, shutting down gracefully. Send again to force unclean shutdown
  8 INFO:twisted:Closing spider (shutdown)
  9 INFO:twisted:Closing spider (shutdown)
 10 INFO:twisted:Dumping spider stats:
 11 {'downloader/exception_count': 3,
 12  'downloader/exception_type_count/scrapy.exceptions.IgnoreRequest': 3,
 13  'downloader/request_bytes': 9973,

与标准错误时扭曲生成的消息相比：

2011-12-16 17:34:56+0800 [expats] DEBUG: number of rules: 4
2011-12-16 17:34:56+0800 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
2011-12-16 17:34:56+0800 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2011-12-16 17:34:56+0800 [iproperty] INFO: Spider opened
2011-12-16 17:34:56+0800 [iproperty] DEBUG: Redirecting (301) to <GET http://www.iproperty.com.sg/> from <GET http://iproperty.com.sg>
2011-12-16 17:34:57+0800 [iproperty] DEBUG: Crawled (200) <

我试过%（name）s、%（module）s等，但似乎无法显示蜘蛛名称。有人知道答案吗？
编辑：在设置中使用LOG_FILE和LOG_LEVEL的问题是，标准错误时不会显示较低级别的消息。

scrapy

来源：https://stackoverflow.com/questions/8532252/scrapy-logging-to-file-and-stdout-simultaneously-with-spider-names

8条答案

按热度按时间

hwamh0ep1#

您要使用ScrapyFileLogObserver。

import logging
from scrapy.log import ScrapyFileLogObserver

logfile = open('testlog.log', 'w')
log_observer = ScrapyFileLogObserver(logfile, level=logging.DEBUG)
log_observer.start()

我很高兴你问了这个问题，我一直想自己做这个。

赞(0）回复(0）举报 2022-11-09

57hvy0tb2#

使用以下命令可以很容易地重定向输出：scrapy some-scrapy's-args 2>&1 | tee -a logname
这样，所有scrappy输出到stdout和stderr的内容都将被重定向到一个日志名文件，并打印到屏幕上。

赞(0）回复(0）举报 2022-11-09

rhfm7lfc3#

对于那些在阅读当前文档版本之前来到这里的人：

import logging
from scrapy.utils.log import configure_logging

configure_logging(install_root_handler=False)
logging.basicConfig(
    filename='log.txt',
    filemode = 'a',
    format='%(levelname)s: %(message)s',
    level=logging.DEBUG
)

赞(0）回复(0）举报 2022-11-09

rm5edbpk4#

我知道这是旧的，但这是一个真正有帮助的职位，因为类仍然没有正确地记录在Scrapy文档。而且，我们可以跳过导入日志和直接使用scrapy日志。谢谢所有！

from scrapy import log

logfile = open('testlog.log', 'a')
log_observer = log.ScrapyFileLogObserver(logfile, level=log.DEBUG)
log_observer.start()

赞(0）回复(0）举报 2022-11-09

nmpmafwu5#

正如刮骨官所说：
Scrapy使用Python内置的日志系统来记录事件。
因此，您可以像配置普通Python脚本一样配置记录器。
首先，您必须导入日志模块：

import logging

您可以将此行添加到蜘蛛：

logging.getLogger().addHandler(logging.StreamHandler())

它添加了一个流处理程序以将日志记录到控制台。
之后，您必须配置日志文件路径。
添加一个名为custom_settings的dict，其中包含spider指定的设置：

custom_settings = {
     'LOG_FILE': 'my_log.log',
     'LOG_LEVEL': 'INFO',
     ... # you can add more settings
 }

全班同学看起来都像：

import logging

class AbcSpider(scrapy.Spider):
    name: str = 'abc_spider'
    start_urls = ['you_url']
    custom_settings = {
         'LOG_FILE': 'my_log.log',
         'LOG_LEVEL': 'INFO',
         ... # you can add more settings
     }
     logging.getLogger().addHandler(logging.StreamHandler())

     def parse(self, response):
        pass

赞(0）回复(0）举报 2022-11-09

s4chpxco6#

不再支持ScrapyFileLogObserver。您可以使用标准python日志记录模块。

import logging
logging.getLogger().addHandler(logging.StreamHandler())

赞(0）回复(0）举报 2022-11-09

xxls0lw87#

到Scrapy 2.3为止，上面提到的解决方案对我都不起作用。另外，文档中找到的解决方案导致每个消息都覆盖日志文件，这当然不是您想要的日志。我找不到将模式更改为“a”（附加）的内置设置。我使用以下配置代码实现了对file和stdout的日志记录：

configure_logging(settings={
    "LOG_STDOUT": True
})
file_handler = logging.FileHandler(filename, mode="a")
formatter = logging.Formatter(
    fmt="%(asctime)s,%(msecs)d %(name)s %(levelname)s %(message)s",
    datefmt="%H:%M:%S"
)
file_handler.setFormatter(formatter)
file_handler.setLevel("DEBUG")
logging.root.addHandler(file_handler)

赞(0）回复(0）举报 2022-11-09

gxwragnw8#

另一种方法是禁用Scrapy的日志设置并使用自定义设置文件。
settings.py

import logging
import yaml

LOG_ENABLED = False
logging.config.dictConfig(yaml.load(open("logging.yml").read(), Loader=yaml.SafeLoader))

logging.yml

version: 1
formatters:
  simple:
    format: '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
handlers:
  console:
    class: logging.StreamHandler
    level: INFO
    formatter: simple
    stream: ext://sys.stdout
  file:
    class : logging.FileHandler
    level: INFO
    formatter: simple
    filename: scrapy.log
root:
  level: INFO
  handlers: [console, file]
disable_existing_loggers: False

示例_蜘蛛.py

import scrapy

class ExampleSpider(scrapy.Spider):
    name = "example"
    allowed_domains = ["example.com"]
    start_urls = ["http://example.com/"]

    def parse(self, response):
        self.logger.info("test")
        pass

赞(0）回复(0）举报 2022-11-09

我来回答

Scrapy -使用spider名称同时记录到文件和标准输出

8条答案

相关问题

热门标签

最新问答