我如何格式化我的数据,使我有相应的团队与他们的分数?(Scrapy Python)

bbuxkriu  于 2022-11-09  发布在  Python
关注(0)|答案(1)|浏览(134)

我有麻烦格式化我的刮数据,任何建议,我如何可以提取我的数据到四列(获胜的团队,失败的团队,获胜的分数,失败的分数)

import scrapy

class sportsDataSpider(scrapy.Spider):
    name = "sportsSite"
    allowed_domains = ["www.espn.com"]
    start_urls = ["https://www.espn.com/nhl/scoreboard/_/date/20220504"]

    handle_httpstatus_list = [404]

    def parse(self, response,**kwargs):
        hockey_score_selector = response.css(".ScoreCell__Team--scoreboard").extract()
        loser_sel = ".ScoreboardScoreCell__Item--loser .ScoreCell__Score::text"
        winner_sel = ".ScoreboardScoreCell__Item--winner .ScoreCell__Score::text"
        team_sel = ".ScoreboardPage .ScoreCell__TeamName--shortDisplayName::text"

        loser_score = response.css(loser_sel).extract()
        winner_score = response.css(winner_sel).extract()
        teams = response.css(team_sel).extract()

        yield {
            'losing score': loser_score,
            'winning score': winner_score,
            'teams': teams
        }

这是我从这段代码中得到的当前输出。

{'losing score': ['2', '3', '2', '0'], 'winning score': ['5', '5', '6', '6'], 'teams': ['Bruins', 'Hurricanes', 'Lightning', 'Maple Leafs', 'Blues', 'Wild', 'Kings', 'Oilers']}
m3eecexj

m3eecexj1#

不要一次收集基于.ScoreboardPage的所有组,而是尝试收集基于.ScoreboardScoreCell__Item--loser.ScoreboardScoreCell__Item--winner的两组。因此:

def parse(self, response,**kwargs):
        hockey_score_selector = response.css(".ScoreCell__Team--scoreboard").extract()
        loser_sel = ".ScoreboardScoreCell__Item--loser .ScoreCell__Score::text"
        winner_sel = ".ScoreboardScoreCell__Item--winner .ScoreCell__Score::text"

        # team_sel = ".ScoreboardPage .ScoreCell__TeamName--shortDisplayName::text"
        loser_team_sel = ".ScoreboardScoreCell__Item--loser .ScoreCell__TeamName--shortDisplayName::text"
        winner_team_sel = ".ScoreboardScoreCell__Item--winner .ScoreCell__TeamName--shortDisplayName::text"

        loser_score = response.css(loser_sel).extract()
        winner_score = response.css(winner_sel).extract()

        # teams = response.css(team_sel).extract()
        loser_teams = response.css(loser_team_sel).extract()
        winner_teams = response.css(winner_team_sel).extract()

        yield {
            'losing score': loser_score,
            'winning score': winner_score,
            # 'teams': teams,
            'losing team': loser_teams,
            'winning team': winner_teams
        }

输出量:

{'losing score': ['2', '3', '2', '0'],
 'winning score': ['5', '5', '6', '6'],
 'losing team': ['Bruins', 'Maple Leafs', 'Blues', 'Kings'],
 'winning team': ['Hurricanes', 'Lightning', 'Wild', 'Oilers']}

例如,熊队输给飓风队等。

相关问题