如何将表格webscrap到Excel电子表格的两个不同工作表上?

pinkon5k  于 2023-04-07  发布在  其他
关注(0)|答案(1)|浏览(144)

这是我从这两个链接中抓取表的代码。它不会崩溃。“https://racing.hkjc.com/racing/information/English/Jockey/JockeyRanking.aspx““https://racing.hkjc.com/racing/information/English/Trainers/TrainerRanking.aspx“
但是,当我运行它时,两个表似乎相互重叠,并且打印在同一张工作表上而不是不同的工作表上,有什么方法可以解决这个问题吗?

import pandas as pd
from bs4 import BeautifulSoup
from playwright.sync_api import sync_playwright

def scrape_ranking(url, sheet_name):
    with sync_playwright() as pw:
        browser = pw.chromium.launch()
        page = browser.new_page()
        page.goto(url, wait_until="networkidle")
        soup = BeautifulSoup(page.content(), "html.parser")
        table = soup.select_one(".table_bd")

        if table is None:
            print("Table not found.")
        else:
            df = pd.read_html(str(table))[0]
            df.to_excel("hkjc.xlsx", sheet_name=sheet_name, index=True)

# Scrape TrainerRanking page
url_trainer = "https://racing.hkjc.com/racing/information/English/Trainers/TrainerRanking.aspx"
scrape_ranking(url_trainer, "TrainerRanking")

# Scrape JockeyRanking page
url_jockey = "https://racing.hkjc.com/racing/information/English/Jockey/JockeyRanking.aspx"
scrape_ranking(url_jockey, "JockeyRanking")

print("done")
abithluo

abithluo1#

尝试在append模式下使用ExcelWriter

df = pd.read_html(str(table))[0]
            with pd.ExcelWriter("hkjc.xlsx", 
                                engine="openpyxl", 
                                mode='a', if_sheet_exists='new') as writer:
                df.to_excel(writer, sheet_name=sheet_name, index=True)

如果你使用if_sheet_exists='replace',如果已经有一个sheet_name工作表,它将覆盖;if_sheet_exists='overlay'将在这种情况下添加行。

相关问题