我在一个很好的地方格式化一个dataframe的方式,我想它从大学足球时间表,我抢了从ESPN
year = datetime.today().year-1
url = 'https://www.espn.com/college-football/team/schedule/_/id/213/season/'+str(year)
schedule = pd.read_html(url)[0][[0,1]]
schedule.columns = ["Date", "Opponent"]
remove_list = ["DATE","Regular","Bowl"]
schedule = schedule[~schedule["Date"].str.contains('|'.join(remove_list))].reset_index(drop = True)
schedule['Opponent'] = schedule['Opponent'].str.replace("vs", '').str.replace("*", '').str.replace("@", '')
date_list = (schedule['Date'].str[5:]+', '+str(year))
final_date_list = []
for d in date_list:
d = datetime.strptime(d, '%b %d, %Y')
d = datetime.strftime(d, '%Y%d%m')
final_date_list.append(d)
schedule['Date'] = schedule['Date'].str[5:]+', '+str(year)
schedule['Date'] = pd.DataFrame(final_date_list)
schedule
然而,我想做的就是从当前表中删除数字:
Date Opponent
0 20210409 12 Wisconsin
1 20211109 Ball State
2 20211809 22 Auburn
3 20212509 Villanova
4 20210210 Indiana
5 20210910 3 Iowa
6 20212310 Illinois
7 20213010 5 Ohio State
8 20210611 Maryland
9 20211311 6 Michigan
10 20212011 Rutgers
11 20212711 12 Michigan State
12 20210101 21 Arkansas
1条答案
按热度按时间hjqgdpho1#
您可以使用
str.replace
来删除前导数字和空格:输出(用于示例数据):