我想从https://www.reclameaqui.com.br/empresa/santander/lista-reclamacoes/?status=NOT_ANSWERED中删除所有客户投诉
我的代码:
class ComplaintScraper(CrawlSpider):
name = "ComplaintScraper"
allowed_domains = ["https://www.reclameaqui.com.br"]
start_urls = [
"https://www.reclameaqui.com.br/empresa/santander/lista-reclamacoes/?status=NOT_ANSWERED", #Only complaints not answered
]
'''
LinkExtractor: An object which defines how links will be extracted from each crawled page.
'''
rules = (
Rule(LinkExtractor(restrict_css = '#__next > div.sc-1mzw716-0.bbugAk > div.sc-1mzw716-1.dAjixN > div.wydd4i-0.jaTnlr > main > '
+ 'section.wydd4i-5.bDtuKO > div.sc-gJpXkD.ebMJNx.xh9b9g-0.jjQFrx > '
+ 'div.sc-1sm4sxr-0.eFXbXn > div:nth-child(1) > a'), callback = "parse_complaint", follow=True),
)
def start_requests(self):
for url in self.start_urls:
yield SplashRequest(url, self.parse_complaint, endpoint='render.html', args={'wait': 0.5})
def parse_complaint(self, response):
print("Hi", response.url)
但是,我无法在控制台中看到前 n 个链接,即:
- https://www.reclameaqui.com.br/santander/desacordo-comercial_EDEwdRrqoHSC5Win/
- https://www.reclameaqui.com.br/santander/estorno-de-seguro-residencial_RsOgQiG151B-n_x8/
- https://www.reclameaqui.com.br/santander/do-pix_wylD_ba-c_LBOGQ2/
- https://www.reclameaqui.com.br/santander/veiculo-quitado_AP-e15Kmdo0zhlqd/
- ...
我哪里错了?我如何从这个列表中获得所有客户投诉https://www.reclameaqui.com.br/empresa/santander/lista-reclamacoes/?status=NOT_ANSWERED?
1条答案
按热度按时间lhcgjxsq1#
我不用飞溅就能把它们刮得很好...
输出:json