scrapy 零碎的文件,只运行初始start_urls,而不是运行整个列表

7fyelxc5  于 2022-11-09  发布在  其他
关注(0)|答案(1)|浏览(125)

正如标题所说,我正试图运行我的scrapy程序,我遇到的问题是,它似乎只返回了从初始URL(https://www.antaira.com/products/10-100Mbps)的收益。
我不确定我的程序在哪里不工作,在我的代码中,我也留下了一些关于我所尝试的注解代码。

import scrapy
from ..items import AntairaItem

class ProductJumperFix(scrapy.Spider):  # classes should be TitleCase

    name = 'productJumperFix'
    allowed_domains = ['antaira.com']
    start_urls = [
        'https://www.antaira.com/products/10-100Mbps',
        'https://www.antaira.com/products/unmanaged-gigabit'
        'https://www.antaira.com/products/unmanaged-10-100Mbps-PoE'
        'https://www.antaira.com/products/Unmanaged-Gigabit-PoE'
        'https://www.antaira.com/products/Unmanaged-10-gigabit'
        'https://www.antaira.com/products/Unmanaged-10-gigabit-PoE'
    ]

    #def start_requests(self):
    #    yield scrappy.Request(start_urls, self.parse)

    def parse(self, response):
        # iterate through each of the relative urls
        for url in response.xpath('//div[@class="product-container"]//a/@href').getall():
            product_link = response.urljoin(url)  # use variable
            yield scrapy.Request(product_link, callback=self.parse_new_item)

    def parse_new_item(self, response):
        for product in response.css('main.products'):
            items = AntairaItem() # Unique item for each iteration
            items['product_link'] = response.url # get the product link from response
            name = product.css('h1.product-name::text').get().strip()
            features = product.css(('section.features h3 + ul').strip()).getall()
            overview =   product.css('.products .product-overview::text').getall()
            main_image = response.urljoin(product.css('div.selectors img::attr(src)').get())
            rel_links = product.xpath("//script/@src[contains(., '/app/site/hosting/scriptlet.nl')]").getall()
            items['name'] = name,
            items['features'] = features,
            items['overview'] = overview,
            items['main_image'] = main_image,
            items['rel_links'] = rel_links,
            yield items

谢谢大家!
跟进问题,由于某种原因,当我运行“scrapy crawl productJumperFix”我没有从终端获得任何输出,不知道如何调试,因为我甚至看不到输出错误。

w8ntj3qf

w8ntj3qf1#

请尝试使用start_requests方法:
例如:

import scrapy
from ..items import AntairaItem

class ProductJumperFix(scrapy.Spider):

    name = 'productJumperFix'
    allowed_domains = ['antaira.com']

    def start_requests(self):
        urls = [
            'https://www.antaira.com/products/10-100Mbps',
            'https://www.antaira.com/products/unmanaged-gigabit',
            'https://www.antaira.com/products/unmanaged-10-100Mbps-PoE',
            'https://www.antaira.com/products/Unmanaged-Gigabit-PoE',
            'https://www.antaira.com/products/Unmanaged-10-gigabit',
            'https://www.antaira.com/products/Unmanaged-10-gigabit-PoE',
        ]
        for url in urls:
            yield scrapy.Request(url, callback=self.parse)

    def parse(self, response):
        for url in response.xpath('//div[@class="product-container"]//a/@href').getall():
            product_link = response.urljoin(url)  # use variable
            yield scrapy.Request(product_link, callback=self.parse_new_item)

    def parse_new_item(self, response):
        for product in response.css('main.products'):
            items = AntairaItem() 
            items['product_link'] = response.url
            name = product.css('h1.product-name::text').get().strip()
            features = product.css(('section.features h3 + ul').strip()).getall()
            overview =   product.css('.products .product-overview::text').getall()
            main_image = response.urljoin(product.css('div.selectors img::attr(src)').get())
            rel_links = product.xpath("//script/@src[contains(., '/app/site/hosting/scriptlet.nl')]").getall()
            items['name'] = name,
            items['features'] = features,
            items['overview'] = overview,
            items['main_image'] = main_image,
            items['rel_links'] = rel_links,
            yield items

相关问题