我有麻烦格式化我的刮数据,任何建议,我如何可以提取我的数据到四列(获胜的团队,失败的团队,获胜的分数,失败的分数)
import scrapy
class sportsDataSpider(scrapy.Spider):
name = "sportsSite"
allowed_domains = ["www.espn.com"]
start_urls = ["https://www.espn.com/nhl/scoreboard/_/date/20220504"]
handle_httpstatus_list = [404]
def parse(self, response,**kwargs):
hockey_score_selector = response.css(".ScoreCell__Team--scoreboard").extract()
loser_sel = ".ScoreboardScoreCell__Item--loser .ScoreCell__Score::text"
winner_sel = ".ScoreboardScoreCell__Item--winner .ScoreCell__Score::text"
team_sel = ".ScoreboardPage .ScoreCell__TeamName--shortDisplayName::text"
loser_score = response.css(loser_sel).extract()
winner_score = response.css(winner_sel).extract()
teams = response.css(team_sel).extract()
yield {
'losing score': loser_score,
'winning score': winner_score,
'teams': teams
}
这是我从这段代码中得到的当前输出。
{'losing score': ['2', '3', '2', '0'], 'winning score': ['5', '5', '6', '6'], 'teams': ['Bruins', 'Hurricanes', 'Lightning', 'Maple Leafs', 'Blues', 'Wild', 'Kings', 'Oilers']}
1条答案
按热度按时间m3eecexj1#
不要一次收集基于
.ScoreboardPage
的所有组,而是尝试收集基于.ScoreboardScoreCell__Item--loser
和.ScoreboardScoreCell__Item--winner
的两组。因此:输出量:
例如,熊队输给飓风队等。