我写一个脚本刮我的网站使用scrapy,但我目前运行8个不同的脚本刮我的每个集合,并保存到自己的CSV文件.(集合1保存到集合-1.csv等.)
是否有一种方法可以从一个脚本运行多个蜘蛛,并将抓取的数据保存到每个唯一的文件中?
当前脚本如下图。
import scrapy
from scrapy.crawler import CrawlerProcess
import csv
cs = open('results/collection-1-results.csv', 'w', newline="", encoding='utf-8')
header_names = ['stk','name','price','url']
csv_writer = csv.DictWriter(cs, fieldnames=header_names)
csv_writer.writeheader()
class XXX(scrapy.Spider):
name = 'XXX'
start_urls = [
'website-url.com'
]
def parse(self,response):
product_urls = response.css('div.grid-uniform a.product-grid- item::attr(href)').extract()
for product_url in product_urls:
yield
scrapy.Request(url='website-url.com'+product_url,callback=self.next_parse_two)
next_url = response.css('ul.pagination-custom li a[title="Next
»"]::attr(href)').get()
if next_url != None:
yield
scrapy.Request(url='website-url.com'+next_url,callback=self.parse)
def next_parse_two(self,response):
item = dict()
item['stk'] = response.css('script#swym-snippet::text').get().split('stk:')[1].split(',')[0]
item['name'] = response.css('h1.h2::text').get()
item['price'] =response.css('span#productPrice-product-template span.visually-hidden::text').get()
item['url'] = response.url
csv_writer.writerow(item)
cs.flush()
process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})
process.crawl(XXX)
process.start()
1条答案
按热度按时间zzwlnbp81#
是的,您可以通过从同一个脚本调用单独的process.crawl()方法注入每个spider类名,这样您就有了一个spider,然后添加更多所需的东西,如下所示: