allowed_domains = ['www.google.com','google.com',]
start_urls = ['https://www.google.com/search?q=mobiles&tbm=pts&sxsrf=AJOqlzXrlIIii_GtGMCheGMJHKPpQl1hLw%3A1673692348905&source=hp&ei=vITCY_2YNOKVxc8P79uA2A8&iflsig=AK50M_UAAAAAY8KSzHAkD8f8N_ul8boy27FJhuidI9c7&ved=0ahUKEwj95qrv7cb8AhXiSvEDHe8tAPsQ4dUDCAg&uact=5&oq=mobiles&gs_lcp=Cg9nd3Mtd2l6LXBhdGVudHMQAzIECCMQJzIFCAAQkQIyBAgAEEMyCggAEIAEEIcCEBQyCAgAEIAEELEDMggIABCABBCxAzILCAAQgAQQsQMQgwEyCAgAEIAEELEDMggIABCABBCxAzILCAAQgAQQsQMQyQM6CAgAELEDEIMBOgUIABCABDoFCAAQsQM6BQgAEJIDUABYygxg1g1oAHAAeACAAfADiAG4DpIBAzQtNJgBAKABAQ&sclient=gws-wiz-patents']
这是parse和other_link函数
def parse(self, response):
title = response.xpath("//div[@class='yuRUbf']/a/h3/text()").extract_first()
realetd_data = response.xpath("//div[@class='yuRUbf']/a/@href").get()
yield response.follow(url = realetd_data, callback = self.other_link)
def other_link(self,response):
heading = response.xpath("//div[@class='abstract style-scope patent-text']/text()").get()
yield{
'heading': heading
}
我来拿这个
调试:已爬网(200)〈GET https://www.google.com/search?q=mobiles&tbm=pts&sxsrf=AJOqlzXrlIIii_GtGMCheGMJHKPpQl1hLw%3A1673692348905&source=hp&ei=vITCY_2YNOKVxc8P79uA2A8&iflsig=AK50M_UAAAAAY8KSzHAkD8f8N_ul8boy27FJhuidI9c7&ved=0ahUKEwj95qrv7cb8AhXiSvEDHe8tAPsQ4dUDCAg&uact=5&oq=mobiles&gs_lcp=Cg9nd3Mtd2l6LXBhdGVudHMQAzIECCMQJzIFCAAQkQIyBAgAEEMyCggAEIAEEIcCEBQyCAgAEIAEELEDMggIABCABBCxAzILCAAQgAQQsQMQgwEyCAgAEIAEELEDMggIABCABBCxAzILCAAQgAQQsQMQyQM6CAgAELEDEIMBOgUIABCABDoFCAAQsQM6BQgAEJIDUABYygxg1g1oAHAAeACAAfADiAG4DpIBAzQtNJgBAKABAQ&sclient=gws-wiz-patents>(引用者:无)2023-01-14 16:43:26 [scrapy.蜘蛛中间件.非现场]调试:过滤的场外请求'www.google.com.pk':〈GET https://www.google.com.pk/patents/WO2006010333A1?cl=en&dq=mobiles&hl=en&sa=X&ved=2ahUKEwiCmP_c_cb8AhW-qZUCHW4ZABYQ6AF6BAgFEAM> 2023-01-14 16:43:26 [scrapy.core.engine]信息:闭合支架(已完成)2023-01-14 16:43:26 [scrappy. statcollectors]信息:转储Scrapy统计信息:
你能帮帮我吗
1条答案
按热度按时间qvk1mo1f1#
这应该可以,您需要更新
allowed_domains