Scrapy未从分页中返回所有项目

xfb7svmp  于 2022-11-29  发布在  其他
关注(0)|答案(1)|浏览(171)

我想从网站https://www.startech.com.bd刮所有的监视项目.但问题出现时,我运行我的蜘蛛它只返回60结果.这里是我的代码,它不工作的权利:

import scrapy

import time
class StartechSpider(scrapy.Spider):
    name = 'startech'
    allowed_domains = ['startech.com.bd']
    start_urls = ['https://www.startech.com.bd/monitor/']

    def parse(self, response):
        monitors = response.xpath("//div[@class='p-item']")
        for monitor in monitors:
            item = monitor.xpath(".//h4[@class = 'p-item-name']/a/text()").get()
            price = monitor.xpath(".//div[@class = 'p-item-price']/span/text()").get()
            
            yield{
                'item' : item,
                'price' : price
            }
            
        next_page = response.xpath("//ul[@class = 'pagination']/li/a/@href").get()
        print (next_page)
        
        if next_page:
            yield response.follow(next_page, callback = self.parse)

任何帮助都是非常感谢的!

67up9zun

67up9zun1#

//ul[@class = 'pagination']/li/a/@href一次选择10个项目/页,但您必须仅选择下一页的唯一含义。以下xpath表达式获取正确的分页。
编码:

next_page = response.xpath("//a[contains(text(), 'NEXT')]/@href").get()
        print (next_page)
        
        if next_page:
            yield response.follow(next_page, callback = self.parse)

输出:

2022-11-26 01:45:06 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.startech.com.bd/monitor?page=19> (referer: https://www.startech.com.bd/monitor?page=18)
2022-11-26 01:45:06 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.startech.com.bd/monitor?page=19>
{'item': 'HP E27q G4 27 Inch 2K QHD IPS Monitor', 'price': '41,000৳'}
None
2022-11-26 01:45:06 [scrapy.core.engine] INFO: Closing spider (finished)
2022-11-26 01:45:06 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 6702,
 'downloader/request_count': 19,
 'downloader/request_method_count/GET': 19,
 'downloader/response_bytes': 546195,
 'downloader/response_count': 19,
 'downloader/response_status_count/200': 19,
 'elapsed_time_seconds': 9.939978,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2022, 11, 25, 19, 45, 6, 915772),
 'httpcompression/response_bytes': 6200506,
 'httpcompression/response_count': 19,
 'item_scraped_count': 361,

相关问题