python 我想抓取indiamart.com,但它没有返回任何内容

rn0zuynd  于 2023-02-15  发布在  Python
关注(0)|答案(1)|浏览(106)

我是scrapy的新手。我想从www.example.com上删除数据alibaba.com但是我什么也没得到。我不知道问题出在哪里。下面是我的代码

class IndiaSpider(scrapy.Spider):
name = 'india'
allowed_domains = ['indiamart.com']
# search_value = 'car'
start_urls = [f'https://dir.indiamart.com/search.mp?ss=laptop&prdsrc=1&res=RC4']

user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36'
def request_header(self):
    yield scrapy.Request(url=self.start_urls, callback=self.parse, headers={'User-Agent':self.user_agent})

def parse(self, response):
    title = response.xpath("//span[@class='elps elps2 p10b0 fs14 tac mListNme']/a/text()").get()
    related_link = response.xpath("//span[@class='elps elps2 p10b0 fs14 tac mListNme']/a/@href").get()
        
    yield{
        'titling':title,
        'rel_link':related_link
    }

而我得到了

2023-02-14 15:20:34 [scrapy.core.scraper] DEBUG: Scraped from <200 https://dir.indiamart.com/search.mp?ss=car&prdsrc=1&res=RC4>

{“标题”:无,“相关链接”:无,“图像”:[]} 2023-02-14 15:20:34 [报废.核心.发动机]信息:闭合十字轴(已完成)
我昨天得到的结果,它工作得很好,但今天它返回没有。它不是基于JavaScript的网站。我尝试了不止一次,但返回相同

dgiusagp

dgiusagp1#

正如@SuperUser告诉你的,蜘蛛得到None是因为网站使用Javascript来呈现产品信息。如果你在浏览器中禁用Javascript并重新加载页面,你会发现产品没有显示。
但是,您可以从<script>标记之一获取信息。

import scrapy
import json

class AlibabaSpider(scrapy.Spider):
    name = "alibaba"
    allowed_domains = ["alibaba.com"]
    search_value = "laptop"
    start_urls = [f"https://www.alibaba.com/trade/search?fsb=y&IndexArea=product_en&CatId=&tab=all&SearchText={search_value}"]

    def parse(self, response):
        raw_data = response.xpath("//script[contains(., 'window.__page__data__config')]/text()").extract_first()
        raw_data = raw_data.replace("window.__page__data__config = ", "").replace("window.__page__data = window.__page__data__config.props", "")
        data = json.loads(raw_data)

        title = data["props"]["offerResultData"]["offerList"][0]["information"]["puretitle"]
        yield {"title": title} # Laptops Laptop Cheapest OEM Core I5...

相关问题