pandas 如何在同一网站上抓取多个表

kognpnkq 于 2023-09-29 发布在其他

关注(0)|答案(1)|浏览(116)

我正在尝试从这个网站webscrape多个表
这是我的代码

def scrape_ranking(url, sheet_name):
    with sync_playwright() as pw:
        browser = pw.chromium.launch()
        page = browser.new_page()
        page.goto(url, wait_until="networkidle")
        soup = BeautifulSoup(page.content(), "html.parser")
        table = soup.select_one(".table_bd")
        print("done step 1")

        if table is None:
            print("Table not found.")
        else:
            df = pd.read_html(str(table))[0]
            print(df)
            with pd.ExcelWriter("jockeyclub.xlsx", engine="openpyxl", mode='a', if_sheet_exists='overlay') as writer:
                df.to_excel(writer, sheet_name=sheet_name, index=True, startrow = 70)

url_trainer = "https://racing.hkjc.com/racing/information/english/racing/Draw.aspx#race1.aspx"
scrape_ranking(url_trainer, "Race Card 1")

此代码能够打印比赛卡1的表格。但是，当我将该行更改为df = pd.read_html(str(table))[1]或df = pd.read_html(str(table))[2]时，它无法在网站中找到任何其他表。
是否有办法打印网站上的所有表格？

pandas

来源：https://stackoverflow.com/questions/77105189/how-to-webscrape-multiple-tables-on-same-website

1条答案

按热度按时间

iyr7buue1#

在这种情况下，似乎没有必要混合使用模块。- 简单地使用pandas.red_html()，选择带有attrs的表并迭代dataframes的列表：

import pandas as pd

url_trainer = "https://racing.hkjc.com/racing/information/english/racing/Draw.aspx#race1.aspx"

for table in pd.read_html(url_trainer, attrs={'class':'table_bd'}):
    print(table)
    # other tasks you have to perform on the dataframes

编辑

基于您提供的URL第一部分将工作，但如果您错过了URL末尾的#race1...，网站的React略有不同，提供user-agent将解决此问题：

import pandas as pd
import requests

url_trainer = "https://racing.hkjc.com/racing/information/english/racing/Draw.aspx"

list_of_df = pd.read_html(
                requests.get(
                    url_trainer, 
                    headers={'user-agent':'Mozilla/5.0'}
                ).text, 
                attrs={'class':'table_bd'}
            )

for table in list_of_df:
    print(table)

赞(0）回复(0）举报 2023-09-29

我来回答

pandas 如何在同一网站上抓取多个表

1条答案

编辑

相关问题

热门标签

最新问答