当我刮网站,它是完全正确的,但有很多空白和一些不正确的数据。
import scrapy
class AudibleSpider(scrapy.Spider):
name = 'audible'
allowed_domains = ['www.audible.com']
start_urls = ['https://www.audible.com/search/']
def parse(self, response):
# Getting the box that contains all the info we want (title, author, length)
product_container = response.xpath('//div[@class="adbl-impression-container "]//ul')
# Looping through each product listed in the product_container box
for product in product_container:
book_title = product.xpath('.//h3[contains(@class, "bc-heading")]/a/text()').get()
book_author = product.xpath('.//li[contains(@class, "authorLabel")]/span/a/text()').getall()
book_length = product.xpath('.//li[contains(@class, "runtimeLabel")]/span/text()').get()
# Return data extracted
yield {
'title': book_title,
'author': book_author,
'length': book_length,
}
pagination = response.xpath('//ul[contains(@class, "pagingElements")]')
next_page_url = pagination.xpath('.//span[contains(@class, "nextButton")]/a/@href').get()
if next_page_url:
yield response.follow(url=next_page_url, callback=self.parse)
希望有标题,作者和长度作为每页中每本有声书的结果。****结果为:[1]:https://i.stack.imgur.com/st2lm.png
1条答案
按热度按时间zdwk9cvp1#
如果您为产品容器使用更具体的选择器,您将获得所需的结果。
例如: