我尝试在没有“scrappy crawl..."命令的情况下运行我的脚本,我正在遵循此文档https://docs.scrappy. org/en/latest/topics/practices. html #run-scrappy-from-a-script,但是我的代码无法工作。希望得到帮助!
import scrapy
from scrapy.crawler import CrawlerProcess
class misbeneficiosSpider(scrapy.Spider):
name = 'misbeneficios'
start_urls = ['https://productos.misbeneficios.com.uy/tv-y-audio',
'https://productos.misbeneficios.com.uy/tv-y-audio?p=2']
def parse(self, response):
for products in response.css('div.product-item-info'):
yield {
'name': products.css('a.product-item-link::text').get(),
'price': products.css('span.price::text').get().replace('U$S\xa0', '')#[:-3].upper()
}
next_page = response.css('a.action.next').attrib['href']
if next_page is not None:
yield response.follow(next_page, callback=self.parse)
process = CrawlerProcess(settings={
"FEEDS": {
"items.csv": {"format": "csv"},
},
})
process.crawl(misbeneficiosSpider)
process.start()
下面是我看到的错误输出:
2022-11-09 00:02:08 [scrapy.utils.log] INFO: Scrapy 2.7.1 started (bot: scrapybot)
2022-11-09 00:02:08 [scrapy.utils.log] INFO: Versions: lxml 4.9.1.0, libxml2 2.9.12, cssselect 1.2.0, parsel 1.7.0, w3lib 2.0.1, Twisted 22.10.0, Python 3.8.5 (tags/v3.8.5:580fbb0, Jul 20 2020, 15:43:08) [MSC v.1926 32 bit (Intel)], pyOpenSSL 22.1.0 (OpenSSL 3.0.7 1 Nov 2022), cryptography 38.0.3, Platform Windows-10-10.0.22000-SP0
2022-11-09 00:02:08 [scrapy.crawler] INFO: Overridden settings:
{}
2022-11-09 00:02:08 [py.warnings] WARNING: C:\Users\cabre\AppData\Local\Programs\Python\Python38-32\lib\site-packages\scrapy\utils\request.py:231: ScrapyDeprecationWarning: '2.6' is a deprecated value for the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting.
It is also the default value. In other words, it is normal to get this warning if you have not defined a value for the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting. This is so for backward compatibility reasons, but it will change in a future version of Scrapy.
See the documentation of the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting for information on how to handle this deprecation.
return cls(crawler)
2022-11-09 00:02:08 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor
2022-11-09 00:02:08 [scrapy.extensions.telnet] INFO: Telnet Password: 034a39dcf0704cf3
2022-11-09 00:02:08 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2022-11-09 00:02:09 [scrapy.core.downloader.handlers] ERROR: Loading "scrapy.core.downloader.handlers.http.HTTPDownloadHandler" for scheme "http"
其中我不能张贴
1条答案
按热度按时间ne5o7dgx1#
看起来,如果有一些小错误,检查您的代码就可以正常工作。
顺便说一句,每张产品卡有两个
span.price
标签,我不确定你想要哪个。所以我想我只是指定了第一个。例如:
输出:项目.csv
日志