scrapy spider抓取了0个页面,是xpath或URL参数的错误吗?

ca1c2owp  于 2022-11-09  发布在  其他
关注(0)|答案(1)|浏览(136)

我刚开始玩Scrapy任何指导/提示都是赞赏。我寻求刮的结果数据(让我们说只是项目的标题,为简单起见)以下房地产页面:url =“https://www.sreality.cz/en/search/for-sale/apartments/praha?disposition=2%2Bkt&published=month&min-floor=1&max-floor=3“,其中搜索参数在URL中提供(GET方法)。
我尝试了以下基本蜘蛛:

import scrapy
    import json

    class Sp1Spider(scrapy.Spider):
        name = 'sp1'
        allowed_domains = ['www.sreality.cz']
        start_urls = ['https://www.sreality.cz/en/search/for-sale/apartments']

        def parse(self, response):
            apartments = response.xpath('//basci/h2/title/@content').extract()
            yield {"apartment Text ": apartments}

然而,我一直未能刮任何数据或内容的目标页以上,甚至没有网页的标题!

  • 首先,我想知道我是否应该关心通过GET方法在URL中发送的参数(就像POST方法的情况一样),或者它们应该被自动擦除。

P.S.项目的标题位于xpath中:'//basi/h2/title/',它包含一个带有双类“name ng-binding”的span。我试图通过抓取上述元素的整个内容来解决这个问题,所以我在结果中得到了标记,目前还可以。
请帮忙

slsn1g29

slsn1g291#

1.首先,您在start_urls list中注入的url是动态的,但内容是静态html dom
1.如果您关闭浏览器中的JavaScript并刷新url,则会注意到url已更改
1.和更改后的url,你必须使用作为请求的url,因为它的静态url
1.您的xpath表达式有点不正确

工作代码示例:

import scrapy

class Sp1Spider(scrapy.Spider):
    name = 'sp1'
    start_urls = ['https://www.sreality.cz/en/search/for-sale/apartments?_escaped_fragment_=']

    def parse(self, response):
        apartments = response.xpath('//*[@class="basic"]')
        for apartment in apartments:
            title = apartment.xpath('.//*[@class="locality ng-binding"]/text()').get()

            yield {
                'title':title
                }

输出:

{'title': 'M. Švabinského, Bílina - Teplické Předměstí'}
2022-10-29 19:12:44 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sreality.cz/en/search/for-sale/apartments?_escaped_fragment_=>
{'title': 'Na Dračkách, Praha 6'}
2022-10-29 19:12:44 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sreality.cz/en/search/for-sale/apartments?_escaped_fragment_=>
{'title': 'Veleslavínova, Praha - Staré Město'}
2022-10-29 19:12:44 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sreality.cz/en/search/for-sale/apartments?_escaped_fragment_=>
{'title': 'Nádražní, Žlutice'}
2022-10-29 19:12:44 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sreality.cz/en/search/for-sale/apartments?_escaped_fragment_=>
{'title': 'Lumírova, Praha 2 - Nusle'}
2022-10-29 19:12:44 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sreality.cz/en/search/for-sale/apartments?_escaped_fragment_=>
{'title': 'Oldřichova, Praha 2 - Nusle'}
2022-10-29 19:12:44 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sreality.cz/en/search/for-sale/apartments?_escaped_fragment_=>
{'title': 'Lipno nad Vltavou, district Český Krumlov'}
2022-10-29 19:12:44 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sreality.cz/en/search/for-sale/apartments?_escaped_fragment_=>
{'title': 'J. Opletala, České Budějovice - České Budějovice 2'}
2022-10-29 19:12:44 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sreality.cz/en/search/for-sale/apartments?_escaped_fragment_=>
{'title': 'Ovesná, Hostivice'}
2022-10-29 19:12:44 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sreality.cz/en/search/for-sale/apartments?_escaped_fragment_=>
{'title': 'Nová výstavba, Obrnice'}
2022-10-29 19:12:44 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sreality.cz/en/search/for-sale/apartments?_escaped_fragment_=>
{'title': 'Plzeň - Jižní Předměstí, district Plzeň-město'}
2022-10-29 19:12:44 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sreality.cz/en/search/for-sale/apartments?_escaped_fragment_=>
{'title': 'Rychnovská, Jablonec nad Nisou - Kokonín'}
2022-10-29 19:12:44 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sreality.cz/en/search/for-sale/apartments?_escaped_fragment_=>
{'title': 'Vídeňská třída, Znojmo'}
2022-10-29 19:12:44 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sreality.cz/en/search/for-sale/apartments?_escaped_fragment_=>
{'title': 'Brno, district Brno-město'}
2022-10-29 19:12:44 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sreality.cz/en/search/for-sale/apartments?_escaped_fragment_=>
{'title': 'Čajkovského, Karviná - Mizerov'}
2022-10-29 19:12:44 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sreality.cz/en/search/for-sale/apartments?_escaped_fragment_=>
{'title': 'Mečíková, Praha 10 - Záběhlice'}
2022-10-29 19:12:44 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sreality.cz/en/search/for-sale/apartments?_escaped_fragment_=>
{'title': 'Ružinovská, Praha 4 - Krč'}
2022-10-29 19:12:44 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sreality.cz/en/search/for-sale/apartments?_escaped_fragment_=>
{'title': 'Jagellonská, Praha 3 - Vinohrady'}
2022-10-29 19:12:44 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sreality.cz/en/search/for-sale/apartments?_escaped_fragment_=>
{'title': 'Šaldova, Praha 8 - Karlín'}

相关问题