如何在process_request scrapy中打印?

ddrv8njm  于 2022-11-09  发布在  其他
关注(0)|答案(1)|浏览(94)

我是scrapy的初学者,我想在函数process_request中打印一些东西,但问题是在查看scrapy的日志时看不到它
middlewares.py > process_request内的代码

referr = random.choice(refreerr_list)
        print(referr)
        request.header['rtt'] = random.choice(rtt)
        request.header['sec-ch-viewport-width'] = random.choice(width_list)
        request.header['sec-ch-device-memory'] = random.choice(memory_list)
        request.header['device-memory'] = random.choice(memory_list)
        request.header['referer'] = referr
        request.header['cookie'] = 'sessi...'

        print(self._requests_count)

        self._requests_count += 1
        if self._requests_count > 500:
            self._requests_count = 0
            ip_changer.get_new_ip()

        print(f"thitshtisih {self._requests_count}")

由于这我正在搜索thitshtisih,但我不能看到在终端日志。
我也试过这个spider.logger.info('Requestingsssssss %...,但在代码中没有找到相同的
以下是日志的外观:

2022-07-17 07:02:47 [scrapy.utils.log] INFO: Scrapy 2.6.1 started (bot: amazonasin)
2022-07-17 07:02:47 [scrapy.utils.log] INFO: Versions: lxml 4.9.0.0, libxml2 2.9.4, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 22.4.0, Python 3.10.4 (v3.10.4:9d38120e33, Mar 23 2022, 17:29:05) [Clang 13.0.0 (clang-1300.0.29.30)], pyOpenSSL 22.0.0 (OpenSSL 3.0.3 3 May 2022), cryptography 37.0.2, Platform macOS-10.14.1-x86_64-i386-64bit
2022-07-17 07:02:47 [scrapy.crawler] INFO: Overridden settings:
{'AUTOTHROTTLE_ENABLED': True,
 'AUTOTHROTTLE_MAX_DELAY': 80,
 'AUTOTHROTTLE_START_DELAY': 40,
 'BOT_NAME': 'amazonasin',
 'FEED_EXPORT_ENCODING': 'utf-8-sig',
 'LOG_FILE': 'filename.log',
 'NEWSPIDER_MODULE': 'amazonasin.spiders',
 'SPIDER_MODULES': ['amazonasin.spiders']}
2022-07-17 07:02:47 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor
2022-07-17 07:02:47 [scrapy.extensions.telnet] INFO: Telnet Password: a03baa23ca473ca6
2022-07-17 07:02:47 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats',
 'scrapy.extensions.throttle.AutoThrottle']
2022-07-17 07:02:48 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedBrowserType] Family: Android
2022-07-17 07:02:48 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2022-07-17 07:02:48 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2022-07-17 07:02:48 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None

2022-07-17 07:02:50 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2022-07-17 07:02:50 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedBrowserType] Family: Zune
2022-07-17 07:02:50 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy_user_agents.middlewares.RandomUserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2022-07-17 07:02:50 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2022-07-17 07:02:50 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2022-07-17 07:02:50 [scrapy.core.engine] INFO: Spider opened
2022-07-17 07:02:50 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-07-17 07:02:50 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2022-07-17 07:02:50 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36
2022-07-17 07:02:50 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 ...
2022-07-17 07:02:50 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36
2022-07-17 07:02:53 [filelock] DEBUG: Attempting to acquire lock 4706293872 on /Users/adarshraj/.cache/python-tldextract/3.10.4.final__3.10__22a438__tldextract-3.3.0/publicsuffix.org-tlds/de84b5ca2167d4c83e38fb162f2e8738.tldextract.json.lock
2022-07-17 07:02:53 [filelock] DEBUG: Lock 4706293872 acquired on /Users/adarshraj/.cache/python-tldextract/3.10.4.final__3.10__22a438__tldextract-3.3.0/publicsuffix.org-tlds/de84b5ca2167d4c83e38fb162f2e8738.tldextract.json.lock
2022-07-17 07:02:53 [filelock] DEBUG: Attempting to release lock 4706293872 on /Users/adarshraj/.cache/python-tldextract/3.10.4.final__3.10__22a438__tldextract-3.3.0/publicsuffix.org-tlds/de84b5ca2167d4c83e38fb162f2e8738.tldextract.json.lock
2022-07-17 07:02:53 [filelock] DEBUG: Lock 4706293872 released on /Users/adarshraj/.cache/python-tldextract/3.10.4.final__3.10__22a438__tldextract-3.3.0/publicsuffix.org-tlds/de84b5ca2167d4c83e38fb162f2e8738.tldextract.json.lock
2022-07-17 07:02:53 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.amazon.com/dp/B07NP5TZRF> (referer: None)
2022-07-17 07:02:53 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36
2022-07-17 07:02:53 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.amazon.com/dp/B07NP5TZRF>
{'brand': 'A M D',
 'isbn': 'B07NP5TZRF',
 'mainLink': 'https://www.amazon.com/FX-8350-8-Core-Socket-Processor-Thermal/dp/B07NP5TZRF',
 'permalink': 'FX-8350-8-Core-Socket-Processor-Thermal',
 'price': '$130.00'}
2022-07-17 07:03:30 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.amazon.com/dp/B010T6CG7E> (referer: None)
2022-07-17 07:03:30 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36
2022-07-17 07:03:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.amazon.com/dp/B010T6CG7E>
{'brand': 'Intel',
 'isbn': 'B010T6CG7E',
 'mainLink': 'https://www.amazon.com/Intel-I5-6400-FC-LGA14C-Processor-BX80662I56400/dp/B010T6CG7E',
 'permalink': 'Intel-I5-6400-FC-LGA14C-Processor-BX80662I56400',
 'price': '$183.0'}
2022-07-17 07:03:43 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.amazon.com/dp/B083RM87F2> (referer: None)
2022-07-17 07:03:43 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36
2022-07-17 07:03:43 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.amazon.com/dp/B083RM87F2>
{'brand': 'AMD',
 'isbn': 'B083RM87F2',
 'mainLink': 'https://www.amazon.com/AMD-3200MHZ-SYSTEM-COMPONENTS-PROCESSORS/dp/B083RM87F2',
 'permalink': 'AMD-3200MHZ-SYSTEM-COMPONENTS-PROCESSORS',
 'price': '$799.95'}
2022-07-17 07:03:50 [scrapy.extensions.logstats] INFO: Crawled 3 pages (at 3 pages/min), scraped 3 items (at 3 items/min)
2022-07-17 07:03:52 [scrapy.crawler] INFO: Received SIGINT, shutting down gracefully. Send again to force 
2022-07-17 07:03:52 [scrapy.core.engine] INFO: Closing spider (shutdown)
2022-07-17 07:03:52 [scrapy.crawler] INFO: Received SIGINT twice, forcing unclean shutdown
dw1jzc5e

dw1jzc5e1#

def process_links(self, link):
    for link in links:
        #1
        if 'foo' in link.text:
            continue  # skip all links that have "foo" in their text
        yield link 
        #2
        link.url = link.url + '/'  # fix url to avoid unnecessary redirection
        yield link

相关问题