pandas 无法提取雅虎财经数据

gk7wooem 于 2023-06-20 发布在其他

关注(0)|答案(1)|浏览(109)

`from selenium import webdriver
import pandas as pd
import re

# Read the Excel file with the links
df = pd.read_excel('file.xlsx')

# Create empty lists to store the extracted data
company_names = []
earnings_dates = []

# Set up the Selenium driver
driver = webdriver.Chrome()

# Iterate over the links in the DataFrame
for index, row in df.iterrows():
    url = row['Link']  # Assuming the links are in column 'Link'

    # Load the URL in the browser
    driver.get(url)

    # Extract the company name using regular expressions
    try:
        html_content = driver.page_source
        match = re.search(r'<h1 class="D\(ib\) Fz\(18px\)">(.*?)</h1>', html_content)
        if match:
            company_name = match.group(1)
        else:
            company_name = 'Company name not found'
    except:
        company_name = 'Company name not found'

    # Extract the earnings date
    try:
        earnings_date_element = driver.find_element_by_xpath('//td[contains(text(), "Earnings Date")]/following-sibling::td')
        earnings_date = earnings_date_element.text.strip()
    except:
        earnings_date = 'Earnings date not found'

    # Append the extracted data to the lists
    company_names.append(company_name)
    earnings_dates.append(earnings_date)

# Close the Selenium driver
driver.quit()

# Create a new DataFrame with the extracted data
df_extracted = pd.DataFrame({'Link': df['Link'], 'Company Name': company_names, 'Earnings Date': earnings_dates})

# Print the extracted data
print(df_extracted)`

上面的代码我可以提取公司名称，但无法提取收入日期--
https://finance.yahoo.com/quote/A?p=A&.tsrc=fin-srch尝试提取以下结果**Agilent Technologies，Inc.（A）**盈利日期2023年8月14日至2023年8月18日

pandas

来源：https://stackoverflow.com/questions/76399692/unable-to-extract-yahoo-finance-data

1条答案

按热度按时间

gcmastyq1#

在Yahoo Finance网页上，公司名称位于网页上唯一的<h1>标签中：

解决方案

要提取***公司名称***和***收益日期***，可以使用以下locator strategies：

driver.get("https://finance.yahoo.com/quote/A?p=A&.tsrc=fin-srch")
print(driver.find_element(By.CSS_SELECTOR, "h1").text)
print(driver.find_element(By.XPATH, "//td[.//span[text()='Earnings Date']]//following-sibling::td[1]").text)

控制台输出：

Agilent Technologies, Inc. (A)
Aug 14, 2023 - Aug 18, 2023

注意：需要添加以下导入：

from selenium.webdriver.common.by import By

赞(0）回复(0）举报 2023-06-20

我来回答

pandas 无法提取雅虎财经数据

1条答案

解决方案

相关问题

热门标签

最新问答