如何使用Python从该页面提取URL和特定列?

sd2nnvve  于 2022-10-23  发布在  Python
关注(0)|答案(1)|浏览(155)

https://training.lczero.org/networks/?show_all=1
我想从这个网站中提取名为Number、Run、Network、Elo、Games的列。我可以使用Pandas做到这一点,但是pd.read_html()函数无法提取下载数据所需的href值。我试着用BeautifulSoup,但没有落地。我设法得到了所有的url,但我还需要其他列来理解它。有人能帮忙吗?

xxls0lw8

xxls0lw81#

尝试:

import requests
import pandas as pd
from bs4 import BeautifulSoup

url = "https://training.lczero.org/networks/?show_all=1"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

df = pd.read_html(str(soup))[0]
df["links"] = [
    "https://training.lczero.org" + a["href"] for a in soup.select("td > a")
]

print(df.head())

打印:

Number  Run   Network     Elo  Games  Blocks  Filters                        Time  Ordo Elo                                                                                                         links
0  805799    1  a13e6d41  141.26  12533      15      512  2022-10-22 12:33:33 +00:00         0  https://training.lczero.org/get_network?sha=a13e6d412e4d7a113ca604647a6f56845ad280b5584ede96ca6a7658dba7f897
1  805798    1  d6eea775  138.51  63008      15      512  2022-10-22 11:57:32 +00:00         0  https://training.lczero.org/get_network?sha=d6eea77581d45a0e3bc46203baa10eb94b7e345e15c246f0d18b98b9d5d425f6
2  805797    1  cdffe453  133.00  65478      15      512  2022-10-22 11:20:34 +00:00       133  https://training.lczero.org/get_network?sha=cdffe45321e8a843eabc7c6ee71254647b31b5a8798440035ee2b222acc3162a
3  805796    1  6271053e  131.00  66486      15      512  2022-10-22 10:43:30 +00:00       131  https://training.lczero.org/get_network?sha=6271053e90de21c67a25ba23981d8f03e888a4f7afe543f736a057ebb5d07fec
4  805795    1  0b03a5b0  136.00  63894      15      512  2022-10-22 10:07:32 +00:00       136  https://training.lczero.org/get_network?sha=0b03a5b0dbc019e936f075e6f5eacc603d888970e56bb12c6e747b05fda09b86

相关问题