我在网页上抓取NBA的统计数据,并希望能够排序的统计数据,如点,助攻和盖帽。
我有我的Pandas框架,它可以正确地打印出球员和统计数据,包括按年龄等整数排序,如下所示。Example of dataframe sorted by age
然而,当我尝试按点数排序时,它并没有从最高值到最低值进行正确排序,而是从最高的初始值进行排序,比如从9.9到0,尽管很明显有球员每场比赛得分超过10.0。Example of dataframe sorted by points
数据框中存储的数字是否实际上是字符串,因此字符串的比较导致了这个问题?
下面是我正在运行的代码:
year = 2021
# URL page we will scraping (see image above)
url = "https://www.basketball-reference.com/leagues/NBA_{}_per_game.html".format(year)
# this is the HTML from the given URL
html = urlopen(url)
soup = BeautifulSoup(html, features="html.parser")
table = soup.find_all(class_="full_table")
head = soup.find(class_="thead")
headers_raw = [head.text for item in head][0]
headers = headers_raw.replace("\n", ",").split(",")[2:-1]
players = []
for i in range(len(table)):
player = []
for td in table[i].find_all("td"):
player.append(td.text)
players.append(player)
stats = pd.DataFrame(players, columns = headers)
sorted_by_points = stats.sort_values('PTS', ascending=False)
3条答案
按热度按时间wswtfjt71#
是的,听起来像是琴弦。你可以使用
dtypes()
来查看。41ik7eoe2#
是,将
PTS
列转换为浮点数:图纸:
qhhrdooz3#
compare_df,results_dict_cat,results_dict_num=comparison_summarize(df)