Scrapy Parse函数未将找到的值传递给parse_page2函数

3gtaxfhh  于 2022-11-09  发布在  其他
关注(0)|答案(1)|浏览(107)

我试着从playstation webstore抓取标题,从主页面抓取gamelink,从第二页抓取每个游戏的价格。但是,当使用回调函数parse_page2时,所有返回的项目都包含最近项目的标题和item ['link']值。(我们最后一个重新掌握)
我的代码在下面:

class PsStoreSpider(scrapy.Spider):
    name = 'psstore'
    start_urls =['https://store.playstation.com/en-ie/pages/browse']

    def parse(self, response):
        item = PlaystationItem()
        products = response.css('a.psw-link')

        for product in products:

            item['main_url'] = response.url
            item['title'] = product.css('span.psw-t-body.psw-c-t-1.psw-t-truncate-2.psw-m-b-2::text').get()
            item['link'] = 'https://store.playstation.com' + product.css('a.psw-link.psw-content-link').attrib['href']
            link = 'https://store.playstation.com' + product.css('a.psw-link.psw-content-link').attrib['href']

            request = Request(link, callback=self.parse_page2)
            request.meta['item'] = item
            yield request

    def parse_page2(self, response):
        item = response.meta['item']
        item['price'] = response.css('span.psw-t-title-m::text').get()
        item['other_url'] = response.url
        yield item

和部分输出:

2022-05-09 19:54:16 [scrapy.core.scraper] DEBUG: Scraped from <200 https://store.playstation.com/en-ie/concept/229261> 
{'link': 'https://store.playstation.com/en-ie/concept/228638',
 'main_url': 'https://store.playstation.com/en-ie/pages/browse',
 'other_url': 'https://store.playstation.com/en-ie/concept/229261',
 'price': 'Free',
 'title': 'The Last of Us™ Remastered'}
2022-05-09 19:54:16 [scrapy.core.scraper] DEBUG: Scraped from <200 https://store.playstation.com/en-ie/concept/232847> 
{'link': 'https://store.playstation.com/en-ie/concept/228638',
 'main_url': 'https://store.playstation.com/en-ie/pages/browse',
 'other_url': 'https://store.playstation.com/en-ie/concept/232847',
 'price': '€59.99',
 'title': 'The Last of Us™ Remastered'}
2022-05-09 19:54:16 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://store.playstation.com/en-ie/concept/224802> (referer: https://store.playstation.com/en-ie/pages/browse)
2022-05-09 19:54:16 [scrapy.core.scraper] DEBUG: Scraped from <200 https://store.playstation.com/en-ie/concept/224802> 
{'link': 'https://store.playstation.com/en-ie/concept/228638',
 'main_url': 'https://store.playstation.com/en-ie/pages/browse',
 'other_url': 'https://store.playstation.com/en-ie/concept/224802',
 'price': '€29.99',
 'title': 'The Last of Us™ Remastered'}

正如你所看到的,价格是正确的返回,但标题和链接是从最后刮的对象。我错过了什么?
谢谢

axr492tv

axr492tv1#

问题是,您在parse方法的开头创建了item,然后不断地更新它,这也意味着您总是将相同的项传递给parse_page2
如果你要在for-循环中创建你的项,你会在每次迭代中得到一个新的项,并且应该得到预期的结果。
就像这样:

def parse(self, response):
        products = response.css('a.psw-link')

        for product in products:
            item = PlaystationItem()
            item['main_url'] = response.url
            item['title'] = product.css('span.psw-t-body.psw-c-t-1.psw-t-truncate-2.psw-m-b-2::text').get()
            item['link'] = 'https://store.playstation.com' + product.css('a.psw-link.psw-content-link').attrib['href']
            link = 'https://store.playstation.com' + product.css('a.psw-link.psw-content-link').attrib['href']

            request = Request(link, callback=self.parse_page2)
            request.meta['item'] = item
            yield request

相关问题