scrapy 蜘蛛只会爬而不会刮为什么?

jmo0nnb3  于 2022-11-09  发布在  其他
关注(0)|答案(1)|浏览(154)

我在爬这个网站
https://www.ebay.com/sch/i.html?_dmd=2&_dkr=1&iconV2Request=true&_ssn=a2z_prime_auto_parts&store_name=a2zprimeautoparts&_oac=1&_pgn=1
我试图进入每一个产品,并取得其名称和价格和其他东西,但我面临的问题,这是新的我。

总共有1800多个产品,都有相同的xpath,我想刮一下。但它只刮了96个。可能是什么问题?

import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule

class AutopartSpider(CrawlSpider):
    name = 'Autopart'
    allowed_domains = ['www.ebay.com']
    start_urls = ['https://www.ebay.com/sch/i.html?_dmd=2&_dkr=1&iconV2Request=true&_ssn=a2z_prime_auto_parts&store_name=a2zprimeautoparts&_oac=1']

    rules = (
        Rule(LinkExtractor(restrict_xpaths="//div[@class ='s-item__info clearfix']/a"), callback='parse_item', follow=True),
        Rule(LinkExtractor(restrict_xpaths="//a[@class='pagination__next icon-link']"))
    )

    def parse_item(self, response):
        yield{
            'part_name':response.xpath("//div[@class='vim x-item-title']/h1/span/text()").get()
        }
oiopk7p5

oiopk7p51#

你的代码运行正常,但是你遇到了一个小问题,这就是为什么你得到的响应状态200/400,也就是follow = True不在正确的位置。你必须把它放在分页Rules中。

import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule

class AutopartSpider(CrawlSpider):
    name = 'Autopart'
    allowed_domains = ['www.ebay.com']
    start_urls = ['https://www.ebay.com/sch/i.html?_dmd=2&_dkr=1&iconV2Request=true&_ssn=a2z_prime_auto_parts&store_name=a2zprimeautoparts&_oac=1&_pgn=1']

    rules = (
        Rule(LinkExtractor(restrict_xpaths="//div[@class ='s-item__info clearfix']/a"), callback='parse_item'),
        Rule(LinkExtractor(restrict_xpaths="//a[@class='pagination__next icon-link']"),follow=True)
    )

    def parse_item(self, response):
        yield{
            'part_name':response.xpath("//div[@class='vim x-item-title']/h1/span/text()").get()
        }

输出:

'part_name': 'Rear Wheel Bearing & Hub Assembly For Cadillac Fleetwood 1988-1990 4-Wheel ABS'}
2022-09-07 20:35:52 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/itm/374136029760?hash=item571c3ec240:g:WrsAAOSwd3lirLyQ> (referer: https://www.ebay.com/sch/i.html?_dmd=2&_dkr=1&iconV2Request=true&_ssn=a2z_prime_auto_parts&store_name=a2zprimeautoparts&_oac=1&_pgn=1)
2022-09-07 20:35:52 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ebay.com/itm/374136029711?hash=item571c3ec20f:g:sfwAAOSwFFRirLyM>
{'part_name': 'Front Wheel Bearing & Hub Assembly For Pontiac Grand Prix 2003-2008-0238'}
2022-09-07 20:35:52 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ebay.com/itm/374136029965?hash=item571c3ec30d:g:QG0AAOSwRC1irLyR>
{'part_name': 'Front Wheel Bearing & Hub Assembly For Toyota Prius C 2012-2015'}
2022-09-07 20:35:52 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/itm/374136029434?hash=item571c3ec0fa:g:JSYAAOSwvspirLyH> (referer: https://www.ebay.com/sch/i.html?_dmd=2&_dkr=1&iconV2Request=true&_ssn=a2z_prime_auto_parts&store_name=a2zprimeautoparts&_oac=1&_pgn=1)
2022-09-07 20:35:52 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ebay.com/itm/374136029760?hash=item571c3ec240:g:WrsAAOSwd3lirLyQ>
{'part_name': 'Front Wheel Bearing & Hub Assembly For Ford Taurus X 2008-2009 FWD'}
2022-09-07 20:35:52 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ebay.com/itm/374136029434?hash=item571c3ec0fa:g:JSYAAOSwvspirLyH>
{'part_name': 'Rear Wheel Bearing & Hub Assembly For Dodge Viper 1992-1995'}

相关问题