我以json格式输出抓取的数据。默认scrapy导出器输出json格式的dict列表。项目类型如下所示:
[{"Product Name":"Product1", "Categories":["Clothing","Top"], "Price":"20.5", "Currency":"USD"},
{"Product Name":"Product2", "Categories":["Clothing","Top"], "Price":"21.5", "Currency":"USD"},
{"Product Name":"Product3", "Categories":["Clothing","Top"], "Price":"22.5", "Currency":"USD"},
{"Product Name":"Product4", "Categories":["Clothing","Top"], "Price":"23.5", "Currency":"USD"}, ...]
但我想导出的数据在一个特定的格式,如-所以我将设置商店名称,位置,联系人手动在一个变量。然后将需要获得的数据,我爬,并粘贴在一个数组中的产品键值。
{
"Shop Name":"Shop 1",
"Location":"XXXXXXXXX",
"Contact":"XXXX-XXXXX",
"Products":
[{"Product Name":"Product1", "Categories":["Clothing","Top"], "Price":"20.5", "Currency":"USD"},
{"Product Name":"Product2", "Categories":["Clothing","Top"], "Price":"21.5", "Currency":"USD"},
{"Product Name":"Product3", "Categories":["Clothing","Top"], "Price":"22.5", "Currency":"USD"},
{"Product Name":"Product4", "Categories":["Clothing","Top"], "Price":"23.5", "Currency":"USD"}, ...]
}
下面是我的代码,我如何获得抓取的数据。
def parse(self, response):
for products in response.css('div.single_product'):
yield {
'name': products.css('h4.product_name::text').get(),
'price': products.css('span.current_price::text').get(),
'code': products.css('div.single_product').attrib['data-itemcode'],
'url' : urljoin("https://xxxx", products.css('a.image-popup-no-margins').attrib['data-image'] )
}
1条答案
按热度按时间uwopmtnx1#
格式化dict并生成它(而不是分别生成每个项目),例如: