我试着从playstation webstore抓取标题,从主页面抓取gamelink,从第二页抓取每个游戏的价格。但是,当使用回调函数parse_page2时,所有返回的项目都包含最近项目的标题和item ['link']值。(我们最后一个重新掌握)
我的代码在下面:
class PsStoreSpider(scrapy.Spider):
name = 'psstore'
start_urls =['https://store.playstation.com/en-ie/pages/browse']
def parse(self, response):
item = PlaystationItem()
products = response.css('a.psw-link')
for product in products:
item['main_url'] = response.url
item['title'] = product.css('span.psw-t-body.psw-c-t-1.psw-t-truncate-2.psw-m-b-2::text').get()
item['link'] = 'https://store.playstation.com' + product.css('a.psw-link.psw-content-link').attrib['href']
link = 'https://store.playstation.com' + product.css('a.psw-link.psw-content-link').attrib['href']
request = Request(link, callback=self.parse_page2)
request.meta['item'] = item
yield request
def parse_page2(self, response):
item = response.meta['item']
item['price'] = response.css('span.psw-t-title-m::text').get()
item['other_url'] = response.url
yield item
和部分输出:
2022-05-09 19:54:16 [scrapy.core.scraper] DEBUG: Scraped from <200 https://store.playstation.com/en-ie/concept/229261>
{'link': 'https://store.playstation.com/en-ie/concept/228638',
'main_url': 'https://store.playstation.com/en-ie/pages/browse',
'other_url': 'https://store.playstation.com/en-ie/concept/229261',
'price': 'Free',
'title': 'The Last of Us™ Remastered'}
2022-05-09 19:54:16 [scrapy.core.scraper] DEBUG: Scraped from <200 https://store.playstation.com/en-ie/concept/232847>
{'link': 'https://store.playstation.com/en-ie/concept/228638',
'main_url': 'https://store.playstation.com/en-ie/pages/browse',
'other_url': 'https://store.playstation.com/en-ie/concept/232847',
'price': '€59.99',
'title': 'The Last of Us™ Remastered'}
2022-05-09 19:54:16 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://store.playstation.com/en-ie/concept/224802> (referer: https://store.playstation.com/en-ie/pages/browse)
2022-05-09 19:54:16 [scrapy.core.scraper] DEBUG: Scraped from <200 https://store.playstation.com/en-ie/concept/224802>
{'link': 'https://store.playstation.com/en-ie/concept/228638',
'main_url': 'https://store.playstation.com/en-ie/pages/browse',
'other_url': 'https://store.playstation.com/en-ie/concept/224802',
'price': '€29.99',
'title': 'The Last of Us™ Remastered'}
正如你所看到的,价格是正确的返回,但标题和链接是从最后刮的对象。我错过了什么?
谢谢
1条答案
按热度按时间axr492tv1#
问题是,您在parse方法的开头创建了
item
,然后不断地更新它,这也意味着您总是将相同的项传递给parse_page2
。如果你要在
for
-循环中创建你的项,你会在每次迭代中得到一个新的项,并且应该得到预期的结果。就像这样: