我尝试在python中使用Scrapy刮取多个作者名,但由于内部div和每个作者的css_class的更改,我收到错误

idv4meu8  于 2022-11-09  发布在  Python
关注(0)|答案(1)|浏览(76)

我尝试在python中使用Scrapy刮取多个作者名,但由于内部div和每个作者的css_class的更改,我遇到错误,我遇到此错误,AttributeError: 'SelectorList' object has no attribute 'response'

class NewsSpider(scrapy.Spider):
    name = "travelandleisure"

    def start_requests(self):
        url = input("Enter the article url: ")

        yield scrapy.Request(url, callback=self.parse_dir_contents)

    def parse_dir_contents(self, response):
        try:
            Authoro = response.css('div.comp mntl-bylines__group--author mntl-bylines__group mntl-block')
            Author = []
            for item in Authoro.response.css('div.comp mntl-bylines__item mntl-attribution__item::text'):
                Authoro.append(item)
            for item in Authoro.response.css('div.comp mntl-bylines__item mntl-attribution__item mntl-attribution__item--has-date::text'):
                Authoro.append(item)
        except IndexError:
            Author = "NULL"
        yield{
            'Category':Category,
            'Headlines':Headlines,
            'Author': Author,
        }

这里是link of site,看authorsHTML codehttps://www.travelandleisure.com/travel-news/where-can-americans-travel-right-now-a-country-by-country-guide

mum43rcc

mum43rcc1#

这是从该页面获取作者的一种方法:

[...]
def parse(self, response):
        title = response.xpath('//h1[@id="article-heading_1-0"]/text()').get()
        authors = ', '.join(set([x.strip() for x in response.xpath('//a[@class="mntl-attribution__item-name"]/text()').extract()]))
        ##[... other stuff from page]
        yield {
            'title': title.strip(),
            'authors': authors,
            ## [..]
        }

粗糙的文档:https://docs.scrapy.org/en/latest/

相关问题