scrapy 报废调试:筛选的异地请求

yk9xbfzb  于 2023-01-17  发布在  其他
关注(0)|答案(1)|浏览(89)
allowed_domains = ['www.google.com','google.com',]
start_urls = ['https://www.google.com/search?q=mobiles&tbm=pts&sxsrf=AJOqlzXrlIIii_GtGMCheGMJHKPpQl1hLw%3A1673692348905&source=hp&ei=vITCY_2YNOKVxc8P79uA2A8&iflsig=AK50M_UAAAAAY8KSzHAkD8f8N_ul8boy27FJhuidI9c7&ved=0ahUKEwj95qrv7cb8AhXiSvEDHe8tAPsQ4dUDCAg&uact=5&oq=mobiles&gs_lcp=Cg9nd3Mtd2l6LXBhdGVudHMQAzIECCMQJzIFCAAQkQIyBAgAEEMyCggAEIAEEIcCEBQyCAgAEIAEELEDMggIABCABBCxAzILCAAQgAQQsQMQgwEyCAgAEIAEELEDMggIABCABBCxAzILCAAQgAQQsQMQyQM6CAgAELEDEIMBOgUIABCABDoFCAAQsQM6BQgAEJIDUABYygxg1g1oAHAAeACAAfADiAG4DpIBAzQtNJgBAKABAQ&sclient=gws-wiz-patents']

这是parse和other_link函数

def parse(self, response):

        title = response.xpath("//div[@class='yuRUbf']/a/h3/text()").extract_first()
        realetd_data = response.xpath("//div[@class='yuRUbf']/a/@href").get()

       

        yield response.follow(url = realetd_data, callback = self.other_link)

    def other_link(self,response):
        heading = response.xpath("//div[@class='abstract style-scope patent-text']/text()").get()

        yield{
            'heading': heading
        }

我来拿这个
调试:已爬网(200)〈GET https://www.google.com/search?q=mobiles&tbm=pts&sxsrf=AJOqlzXrlIIii_GtGMCheGMJHKPpQl1hLw%3A1673692348905&source=hp&ei=vITCY_2YNOKVxc8P79uA2A8&iflsig=AK50M_UAAAAAY8KSzHAkD8f8N_ul8boy27FJhuidI9c7&ved=0ahUKEwj95qrv7cb8AhXiSvEDHe8tAPsQ4dUDCAg&uact=5&oq=mobiles&gs_lcp=Cg9nd3Mtd2l6LXBhdGVudHMQAzIECCMQJzIFCAAQkQIyBAgAEEMyCggAEIAEEIcCEBQyCAgAEIAEELEDMggIABCABBCxAzILCAAQgAQQsQMQgwEyCAgAEIAEELEDMggIABCABBCxAzILCAAQgAQQsQMQyQM6CAgAELEDEIMBOgUIABCABDoFCAAQsQM6BQgAEJIDUABYygxg1g1oAHAAeACAAfADiAG4DpIBAzQtNJgBAKABAQ&sclient=gws-wiz-patents>(引用者:无)2023-01-14 16:43:26 [scrapy.蜘蛛中间件.非现场]调试:过滤的场外请求'www.google.com.pk':〈GET https://www.google.com.pk/patents/WO2006010333A1?cl=en&dq=mobiles&hl=en&sa=X&ved=2ahUKEwiCmP_c_cb8AhW-qZUCHW4ZABYQ6AF6BAgFEAM> 2023-01-14 16:43:26 [scrapy.core.engine]信息:闭合支架(已完成)2023-01-14 16:43:26 [scrappy. statcollectors]信息:转储Scrapy统计信息:
你能帮帮我吗

qvk1mo1f

qvk1mo1f1#

allowed_domains = ['www.google.com','google.com', ' https://www.google.com.pk']

这应该可以,您需要更新allowed_domains

相关问题