Scrapy跟随与全部跟随

yvt65v4c 于 2022-11-09 发布在其他

关注(0)|答案(1)|浏览(160)

从《斯克雷托的教程》中，有一个例子：报价蜘蛛。
跟随链接时，

import scrapy

class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = [
        'http://quotes.toscrape.com/page/1/',
    ]

    def parse(self, response):
        page = response.url.split("/")[-2]
        filename = 'quotes-%s.html' % page
        with open(filename, 'wb') as f:
            f.write(response.body)
        next_page = response.css('li.next a::attr(href)').get()
        if next_page is not None:
            yield response.follow(next_page, callback=self.parse)

此代码将获取所有页面。
或者，

import scrapy

class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = [
        'http://quotes.toscrape.com/page/1/',
    ]

    def parse(self, response):
        page = response.url.split("/")[-2]
        filename = 'quotes-%s.html' % page
        with open(filename, 'wb') as f:
            f.write(response.body)

        yield from response.follow_all(css='li.next a', callback=self.parse)
        # or equivalently
        # urls = response.css("li.next a")
        # yield from response.follow_all(urls=urls, callback=self.parse)

但是当我用yield response.follow(css='li.next a', callback=self.parse)替换yield from response.follow_all(css='li.next a', callback=self.parse)时，它只取第1页。因为response.css("li.next a")最多返回一个选择器，所以我希望在后一种情况下它也会取所有的页。为什么？
提前感谢！

scrapy

来源：https://stackoverflow.com/questions/62484419/scrapy-follow-vs-follow-all

1条答案

按热度按时间

hmae6n7t1#

有点晚了：
这是因为response.follow()不接受css作为参数，所以代码只获取第一页，然后失败。你得到的响应是一个TextResponse类。当你传递response.follow_all(css=...时，发生的是类正在执行self.css(css)方法，该方法返回一个URL的可迭代对象，这就是为什么它能工作。对于response.follow()，没有这样的方法
您可以在TextResponse中阅读更多信息

赞(0）回复(0）举报 2022-11-09

我来回答

Scrapy跟随与全部跟随

1条答案

相关问题

热门标签

最新问答