使用Python Scrapy提取足球直播站点中的XPATH

yhuiod9q 于 2022-11-09 发布在 Python

关注(0)|答案(1)|浏览(180)

我正在尝试使用Scrapy返回SofaScore中现场比赛的结果和统计数据。
站点：https://www.sofascore.com/
下面的代码：

import scrapy

class SofascoreSpider(scrapy.Spider):
    name = 'SofaScore'
    allowed_domains = ['sofascore.com']
    start_urls = ['http://sofascore.com/']

    def parse(self, response):
        time1 =
response.xpath("/html/body/div[1]/main/div/div[2]/div/div[3]/div[2]/div/div/div/div/div[2]/a/div/div").extract()
        print(time1)
        pass

我也试着用response.xpath("//html/body/div[1]/main/div/div[2]/div/div[3]/div[2]/div/div/div/div/div[2]/a/div/div").getall()，但是它什么也没返回。我用了很多不同的xpath，但是它都没有返回。我做错了什么？
比如，今天10/06页面上的第一场比赛是法国对奥地利，xpath：/html/正文/div[1]/主要/div/div[2]/div/div[3]/div[2]/div/div/div/div/div[2]/a/div/div

scrapy

来源：https://stackoverflow.com/questions/72568822/using-python-scrapy-to-extract-xpath-in-a-soccer-live-site

1条答案

按热度按时间

z31licg01#

数据是用JavaScript生成的，但也可以从API获取。
在浏览器中打开devtools，点击network标签，然后点击live按钮，查看它加载数据的位置，然后查看JSON文件，查看其结构。

import scrapy

class SofascoreSpider(scrapy.Spider):
    name = 'SofaScore'
    allowed_domains = ['sofascore.com']
    start_urls = ['https://api.sofascore.com/api/v1/sport/football/events/live']
    custom_settings = {'DOWNLOAD_DELAY': 0.4}

    def start_requests(self):
        headers = {
            "Accept": "*/*",
            "Accept-Encoding": "gzip, deflate, br",
            "Accept-Language": "en-US,en;q=0.5",
            "Cache-Control": "no-cache",
            "Connection": "keep-alive",
            "DNT": "1",
            "Host": "api.sofascore.com",
            "Origin": "https://www.sofascore.com",
            "Pragma": "no-cache",
            "Referer": "https://www.sofascore.com/",
            "Sec-Fetch-Dest": "empty",
            "Sec-Fetch-Mode": "cors",
            "Sec-Fetch-Site": "same-site",
            "Sec-GPC": "1",
            "TE": "trailers",
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
        }
        yield scrapy.Request(url=self.start_urls[0], headers=headers)

    def parse(self, response):
        events = response.json()
        events = events['events']
        # now iterate throught the list and get what you want from it
        # example:
        for event in events:
            yield {
                'event name': event['tournament']['name'],
                'time': event['time']
            }

赞(0）回复(0）举报 2022-11-09

我来回答

使用Python Scrapy提取足球直播站点中的XPATH

1条答案

相关问题

热门标签

最新问答