scrapy 无法抓取图像URL(抓取)

w1e3prcc  于 2022-11-09  发布在  其他
关注(0)|答案(1)|浏览(214)

我正在尝试使用Scrapy刮取数据。除了产品图像URL之外,所有零件数据都被提取。当尝试提取图像URL时,它返回一个空字符串列表,如下图所示

项目代码
卫生巾.py(蜘蛛)

import scrapy
from ..items import DataItem

class MensclothsSpider(scrapy.Spider):
    name = 'menscloths'
    next_page=2
    start_urls = ['https://www.example.com/clothing-and-accessories/topwear/pr?sid=clo%2Cash&otracker=categorytree&p%5B%5D=facets.ideal_for%255B%255D%3DMen&page=1']

    def parse(self, response):
        items=DataItem()
        products=response.css("div._1xHGtK")
        for product in products:
            name = product.css(".IRpwTa::text").extract()
            brand = product.css("._2WkVRV::text").extract()
            original_price = product.css("._3I9_wc::text").extract()[1]
            sale_price = product.css("._30jeq3::text").extract()[0][1:]
            image_url = product.css("._2r_T1I::attr('src')").extract()
            product_page_url = "https://www.example.com"+product.css("._2UzuFa::attr('href')").extract()[0]
            product_category = "men topwear"

            items["name"]=name
            items["brand"]=brand
            items["original_price"]=original_price
            items["sale_price"]=sale_price
            items["image_url"]=image_url
            items["product_page_url"]=product_page_url
            items["product_category"]=product_category
            yield items

项目.py

import scrapy

class DataItem(scrapy.Item):
    # define the fields for your item here like:
    name = scrapy.Field()
    brand = scrapy.Field()
    original_price = scrapy.Field()
    sale_price = scrapy.Field()
    image_url = scrapy.Field()
    product_page_url = scrapy.Field()
    product_category = scrapy.Field()

设置.py

BOT_NAME = 'scraper'

SPIDER_MODULES = ['scraper.spiders']
NEWSPIDER_MODULE = 'scraper.spiders'

ITEM_PIPELINES = {
   'scraper.pipelines.ScraperPipeline': 300,
}

提前感谢

qxsslcnc

qxsslcnc1#

如果你在加载页面时仔细观察图片,你会发现图片在一段时间后出现(尽管,至少对我来说,加载所需的时间大约是1秒钟)。然而,您的代码只是加载页面,然后试图获取图像,而不是等待图像加载。您需要某种等待函数来等待图像加载,然后获取图像。

相关问题