scrapy 我的代码没有给出下一页的结果?

wljmcqd8  于 2022-11-09  发布在  其他
关注(0)|答案(1)|浏览(147)

这段代码给出了第一页的项目,但没有进入下一页。第一个函数调用第二个函数运行第一页的产品,但在完成第一个函数中的循环后,它应该为下一页调用自己,但它没有。如有帮助,将不胜感激

from gc import callbacks
    from subprocess import call
    import scrapy
    from scrapy.spiders import CrawlSpider, Rule
    from scrapy.linkextractors import LinkExtractor
    from lxml import html

    class EbaySpider(scrapy.Spider):
    name = 'ebay'
    allowed_domains = ['ebay.co.uk']
    start_urls = ['https://www.ebay.co.uk/sch/i.html?_dmd=2&_dkr=1&
    iconV2Request=true&_ssn=jessicasmith2022&store_name=jesssuperstoreenterprise&
    _sop=10&_oac=1&_ipg=240&_fcid=3&_pgn=1' 
    ]

    for url in start_urls:
        def parse(self, response):
            for link in (response.css('.s-item__info.clearfix > a::attr(href)')
            .getall()):
                yield response.follow(link , callback = self.productlinks)
            next_page = response.xpath('//*[contains(concat( " ",
            @class, " " ), concat( " ", "icon-link", " " ))]/@href')
           .extract_first()
            if next_page:
                next_page_link = response.urljoin(next_page)
                yield scrapy.Request(url=next_page_link , callback=self.parse)
6jygbczu

6jygbczu1#

您会得到两个分页结果:第一个是上一页,第二个是下一页。因为你使用extract_first(),所以你得到的是上一页,但它并不存在(因为你已经在第一页了)。
你的代码也可以在很多方面得到改进。请再读一遍documentation,也许可以使用xpath备忘单之类的东西。

import scrapy

class EbaySpider(scrapy.Spider):
    name = 'ebay'
    allowed_domains = ['ebay.co.uk']
    start_urls = ['https://www.ebay.co.uk/sch/i.html?_dmd=2&_dkr=1&iconV2Request=true&_ssn=jessicasmith2022&store_name=jesssuperstoreenterprise&_sop=10&_oac=1&_ipg=240&_fcid=3&_pgn=1']

    custom_settings = {'DOWNLOAD_DELAY': 0.5}

    def parse(self, response):
        for link in (response.css('.s-item__info.clearfix > a::attr(href)').getall()):
            yield response.follow(link, callback=self.productlinks)

        next_page = response.xpath('//a[contains(@class, "pagination__next")]/@href').get()
        if next_page:
            next_page_link = response.urljoin(next_page)
            yield scrapy.Request(url=next_page_link, callback=self.parse)

    def productlinks(self, response):
        yield {
            'ITEM_Name': response.xpath('//h1//span//text()').get(),
            'TIMES_Sold': response.xpath('//div[contains(@class, "quantity")]//a/text()').get(default='Unknown'),
            'ITEM_Price £': response.xpath('//span[@id="prcIsum"]/text()').get(default='').replace('£', '')
        }

相关问题