python 我想抓取indiamart.com，但它没有返回任何内容

rn0zuynd 于 2023-02-15 发布在 Python

关注(0)|答案(1)|浏览(116)

我是scrapy的新手。我想从www.example.com上删除数据alibaba.com但是我什么也没得到。我不知道问题出在哪里。下面是我的代码

class IndiaSpider(scrapy.Spider):
name = 'india'
allowed_domains = ['indiamart.com']
# search_value = 'car'
start_urls = [f'https://dir.indiamart.com/search.mp?ss=laptop&prdsrc=1&res=RC4']

user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36'
def request_header(self):
    yield scrapy.Request(url=self.start_urls, callback=self.parse, headers={'User-Agent':self.user_agent})

def parse(self, response):
    title = response.xpath("//span[@class='elps elps2 p10b0 fs14 tac mListNme']/a/text()").get()
    related_link = response.xpath("//span[@class='elps elps2 p10b0 fs14 tac mListNme']/a/@href").get()
        
    yield{
        'titling':title,
        'rel_link':related_link
    }

而我得到了

2023-02-14 15:20:34 [scrapy.core.scraper] DEBUG: Scraped from <200 https://dir.indiamart.com/search.mp?ss=car&prdsrc=1&res=RC4>

{“标题”：无，“相关链接”：无，“图像”：[]} 2023-02-14 15：20：34 [报废.核心.发动机]信息：闭合十字轴（已完成）
我昨天得到的结果，它工作得很好，但今天它返回没有。它不是基于JavaScript的网站。我尝试了不止一次，但返回相同

python

来源：https://stackoverflow.com/questions/75157900/i-wants-to-scrape-indiamart-com-but-it-returns-none

1条答案

按热度按时间

dgiusagp1#

正如@SuperUser告诉你的，蜘蛛得到None是因为网站使用Javascript来呈现产品信息。如果你在浏览器中禁用Javascript并重新加载页面，你会发现产品没有显示。
但是，您可以从<script>标记之一获取信息。

import scrapy
import json

class AlibabaSpider(scrapy.Spider):
    name = "alibaba"
    allowed_domains = ["alibaba.com"]
    search_value = "laptop"
    start_urls = [f"https://www.alibaba.com/trade/search?fsb=y&IndexArea=product_en&CatId=&tab=all&SearchText={search_value}"]

    def parse(self, response):
        raw_data = response.xpath("//script[contains(., 'window.__page__data__config')]/text()").extract_first()
        raw_data = raw_data.replace("window.__page__data__config = ", "").replace("window.__page__data = window.__page__data__config.props", "")
        data = json.loads(raw_data)

        title = data["props"]["offerResultData"]["offerList"][0]["information"]["puretitle"]
        yield {"title": title} # Laptops Laptop Cheapest OEM Core I5...

赞(0）回复(0）举报 2023-02-15

我来回答

python 我想抓取indiamart.com，但它没有返回任何内容

1条答案

相关问题

热门标签

最新问答