Python web scrape脚本在持续工作一年多后产生“JSONDecodeError Expecting value”

cygmwpex  于 2023-11-20  发布在  Python
关注(0)|答案(2)|浏览(150)

几年来,我一直在使用下面的代码从网站上抓取表格并将其放入Excel文件中。突然,它停止了工作,我不知道为什么。下面是代码的编辑版本。

import requests
import pandas
#from pandas import DataFrame
import pandas as pd
#import json
#from pandas.io.json import json_normalize
#from bs4 import BeautifulSoup as soup

#These are the headers I pass
headers = {
    'accept': 'application/json, text/plain, */*',
    'accept-encoding': 'gzip, deflate, br',
    'accept-language': 'en-US,en;q=0.6',
    'cookie': '[get the authentication cookie string from website and paste it here]',
    'sec-fetch-dest': 'empty',
    'sec-fetch-mode': 'cors',
    'sec-fetch-site': 'same-origin',
    'sec-gpc': '1',
    'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML,    like Gecko) Chrome/104.0.5112.102 Safari/537.36'
}
overview_2023 = requests.get("https://[site].com/api/v1/teams/overview?    league=ncaa&season=2023&week=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20",   headers=headers).json()   
overviewkeys = overview_2023.keys()
#print(overviewkeys)
#overview_2023.get('restricted')
#print(overview_2023['restricted'])
#overview_2023['team_overview'] points to a list - the one within the dict it belongs to
#print(overview_2021['team_overview'])
teamdata = overview_2023['team_overview']

Site2023teamgrades = requests.get('https://[site].com/api/v1/teams/overview?league=ncaa&season=2023&week=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20', headers=headers).json()

SiteGrades = {}   
for team in Site2023teamgrades['team_overview']:
    SiteGrades[team['name']] = {'name':team['name'],'franchise_id':team['franchise_id'],'abbreviation':team['abbreviation'], 'wins':team['wins'] if 'wins' in team else None, 'losses':team['losses'] if 'losses'in team else None, 'ties':team['ties'] if 'ties' in team else None, 'points_allowed':team['points_allowed'] if 'points_allowed' in team else None, 'points_scored':team['points_scored'] if 'points_scored' in team else None, 'grades_coverage_defense':team['grades_coverage_defense'] if 'grades_coverage_defense' in team else None, 'grades_defense':team['grades_defense'] if 'grades_defense' in team else None,'grades_misc_st':team['grades_misc_st'] if 'grades_misc_st' in team else None, 'grades_offense':team['grades_offense'] if 'grades_offense' in team else None, 'grades_overall':team['grades_overall'] if 'grades_overall' in team else None, 'grades_pass':team['grades_pass'] if 'grades_pass' in team else None, 'grades_pass_block':team['grades_pass_block'] if 'grades_pass_block' in team else None, 'grades_pass_route':team['grades_pass_route'] if 'grades_pass_route' in team else None, 'grades_pass_rush_defense':team['grades_pass_rush_defense'] if 'grades_pass_rush_defense' in team else None, 'grades_run':team['grades_run'] if 'grades_run' in team else None, 'grades_run_block':team['grades_run_block'] if 'grades_run_block' in team else None, 'grades_run_defense':team['grades_run_defense'] if 'grades_run_defense' in team else None, 'grades_tackle':team['grades_tackle'] if 'grades_tackle' in team else None}

gradestable = pd.DataFrame.from_dict(SiteGrades)
gradestable = gradestable.T

table.to_excel(r'C:\[path]\2023SiteExports.xlsx', sheet_name = '2023grades', index = False)

字符串
突然,我得到了JSONDecodeError:Expecting值。
我希望结果是一个包含所需数据的Excel文件。
我已经更新了身份验证cookie,所以这不是问题所在。
当我测试代码时:

if response.status_code == 200:
    try:
        data = response.json()
    except ValueError:
        print("Response not in expected JSON format.")
        print("Response content:", response.text)

else:
    print("Request failed with status code:", response.status_code)
    print("Response content:", response.text)`


我得到的响应不是预期的JSON格式,打印时有很多乱码。
但是当我检查这个来源的站点时,Fetch/XHR的“Response”选项卡显示了清晰的JSON格式的数据。

2ul0zpep

2ul0zpep1#

删除您正在使用的headers=(或至少删除accept-encoding键)。尝试:

import requests

url = "https://premium.pff.com/api/v1/teams/overview?league=ncaa&season=2023&week=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20"

data = requests.get(url).json()
print(data)

字符串
打印:

{'restricted': ['grades_coverage_defense', 'grades_defense', 'grades_misc_st',

...

cgyqldqp

cgyqldqp2#

URI已更改。下面是新的URI:
第一个月

相关问题