我正在尝试使用Scrapy刮取数据。除了产品图像URL之外,所有零件数据都被提取。当尝试提取图像URL时,它返回一个空字符串列表,如下图所示
项目代码
卫生巾.py(蜘蛛)
import scrapy
from ..items import DataItem
class MensclothsSpider(scrapy.Spider):
name = 'menscloths'
next_page=2
start_urls = ['https://www.example.com/clothing-and-accessories/topwear/pr?sid=clo%2Cash&otracker=categorytree&p%5B%5D=facets.ideal_for%255B%255D%3DMen&page=1']
def parse(self, response):
items=DataItem()
products=response.css("div._1xHGtK")
for product in products:
name = product.css(".IRpwTa::text").extract()
brand = product.css("._2WkVRV::text").extract()
original_price = product.css("._3I9_wc::text").extract()[1]
sale_price = product.css("._30jeq3::text").extract()[0][1:]
image_url = product.css("._2r_T1I::attr('src')").extract()
product_page_url = "https://www.example.com"+product.css("._2UzuFa::attr('href')").extract()[0]
product_category = "men topwear"
items["name"]=name
items["brand"]=brand
items["original_price"]=original_price
items["sale_price"]=sale_price
items["image_url"]=image_url
items["product_page_url"]=product_page_url
items["product_category"]=product_category
yield items
项目.py
import scrapy
class DataItem(scrapy.Item):
# define the fields for your item here like:
name = scrapy.Field()
brand = scrapy.Field()
original_price = scrapy.Field()
sale_price = scrapy.Field()
image_url = scrapy.Field()
product_page_url = scrapy.Field()
product_category = scrapy.Field()
设置.py
BOT_NAME = 'scraper'
SPIDER_MODULES = ['scraper.spiders']
NEWSPIDER_MODULE = 'scraper.spiders'
ITEM_PIPELINES = {
'scraper.pipelines.ScraperPipeline': 300,
}
提前感谢
1条答案
按热度按时间qxsslcnc1#
如果你在加载页面时仔细观察图片,你会发现图片在一段时间后出现(尽管,至少对我来说,加载所需的时间大约是1秒钟)。然而,您的代码只是加载页面,然后试图获取图像,而不是等待图像加载。您需要某种等待函数来等待图像加载,然后获取图像。