几年来,我一直在使用下面的代码从网站上抓取表格并将其放入Excel文件中。突然,它停止了工作,我不知道为什么。下面是代码的编辑版本。
import requests
import pandas
#from pandas import DataFrame
import pandas as pd
#import json
#from pandas.io.json import json_normalize
#from bs4 import BeautifulSoup as soup
#These are the headers I pass
headers = {
'accept': 'application/json, text/plain, */*',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.6',
'cookie': '[get the authentication cookie string from website and paste it here]',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-origin',
'sec-gpc': '1',
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.102 Safari/537.36'
}
overview_2023 = requests.get("https://[site].com/api/v1/teams/overview? league=ncaa&season=2023&week=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20", headers=headers).json()
overviewkeys = overview_2023.keys()
#print(overviewkeys)
#overview_2023.get('restricted')
#print(overview_2023['restricted'])
#overview_2023['team_overview'] points to a list - the one within the dict it belongs to
#print(overview_2021['team_overview'])
teamdata = overview_2023['team_overview']
Site2023teamgrades = requests.get('https://[site].com/api/v1/teams/overview?league=ncaa&season=2023&week=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20', headers=headers).json()
SiteGrades = {}
for team in Site2023teamgrades['team_overview']:
SiteGrades[team['name']] = {'name':team['name'],'franchise_id':team['franchise_id'],'abbreviation':team['abbreviation'], 'wins':team['wins'] if 'wins' in team else None, 'losses':team['losses'] if 'losses'in team else None, 'ties':team['ties'] if 'ties' in team else None, 'points_allowed':team['points_allowed'] if 'points_allowed' in team else None, 'points_scored':team['points_scored'] if 'points_scored' in team else None, 'grades_coverage_defense':team['grades_coverage_defense'] if 'grades_coverage_defense' in team else None, 'grades_defense':team['grades_defense'] if 'grades_defense' in team else None,'grades_misc_st':team['grades_misc_st'] if 'grades_misc_st' in team else None, 'grades_offense':team['grades_offense'] if 'grades_offense' in team else None, 'grades_overall':team['grades_overall'] if 'grades_overall' in team else None, 'grades_pass':team['grades_pass'] if 'grades_pass' in team else None, 'grades_pass_block':team['grades_pass_block'] if 'grades_pass_block' in team else None, 'grades_pass_route':team['grades_pass_route'] if 'grades_pass_route' in team else None, 'grades_pass_rush_defense':team['grades_pass_rush_defense'] if 'grades_pass_rush_defense' in team else None, 'grades_run':team['grades_run'] if 'grades_run' in team else None, 'grades_run_block':team['grades_run_block'] if 'grades_run_block' in team else None, 'grades_run_defense':team['grades_run_defense'] if 'grades_run_defense' in team else None, 'grades_tackle':team['grades_tackle'] if 'grades_tackle' in team else None}
gradestable = pd.DataFrame.from_dict(SiteGrades)
gradestable = gradestable.T
table.to_excel(r'C:\[path]\2023SiteExports.xlsx', sheet_name = '2023grades', index = False)
字符串
突然,我得到了JSONDecodeError:Expecting值。
我希望结果是一个包含所需数据的Excel文件。
我已经更新了身份验证cookie,所以这不是问题所在。
当我测试代码时:
if response.status_code == 200:
try:
data = response.json()
except ValueError:
print("Response not in expected JSON format.")
print("Response content:", response.text)
else:
print("Request failed with status code:", response.status_code)
print("Response content:", response.text)`
型
我得到的响应不是预期的JSON格式,打印时有很多乱码。
但是当我检查这个来源的站点时,Fetch/XHR的“Response”选项卡显示了清晰的JSON格式的数据。
2条答案
按热度按时间2ul0zpep1#
删除您正在使用的
headers=
(或至少删除accept-encoding
键)。尝试:字符串
打印:
型
cgyqldqp2#
URI已更改。下面是新的URI:
第一个月