pandas 无法提取雅虎财经数据

gk7wooem  于 2023-06-20  发布在  其他
关注(0)|答案(1)|浏览(111)
`from selenium import webdriver
import pandas as pd
import re

# Read the Excel file with the links
df = pd.read_excel('file.xlsx')

# Create empty lists to store the extracted data
company_names = []
earnings_dates = []

# Set up the Selenium driver
driver = webdriver.Chrome()

# Iterate over the links in the DataFrame
for index, row in df.iterrows():
    url = row['Link']  # Assuming the links are in column 'Link'

    # Load the URL in the browser
    driver.get(url)

    # Extract the company name using regular expressions
    try:
        html_content = driver.page_source
        match = re.search(r'<h1 class="D\(ib\) Fz\(18px\)">(.*?)</h1>', html_content)
        if match:
            company_name = match.group(1)
        else:
            company_name = 'Company name not found'
    except:
        company_name = 'Company name not found'

    # Extract the earnings date
    try:
        earnings_date_element = driver.find_element_by_xpath('//td[contains(text(), "Earnings Date")]/following-sibling::td')
        earnings_date = earnings_date_element.text.strip()
    except:
        earnings_date = 'Earnings date not found'

    # Append the extracted data to the lists
    company_names.append(company_name)
    earnings_dates.append(earnings_date)

# Close the Selenium driver
driver.quit()

# Create a new DataFrame with the extracted data
df_extracted = pd.DataFrame({'Link': df['Link'], 'Company Name': company_names, 'Earnings Date': earnings_dates})

# Print the extracted data
print(df_extracted)`

上面的代码我可以提取公司名称,但无法提取收入日期--
https://finance.yahoo.com/quote/A?p=A&.tsrc=fin-srch尝试提取以下结果**Agilent Technologies,Inc.(A)**盈利日期2023年8月14日至2023年8月18日

gcmastyq

gcmastyq1#

Yahoo Finance网页上,公司名称位于网页上唯一的<h1>标签中:

解决方案

要提取***公司名称***和***收益日期***,可以使用以下locator strategies

driver.get("https://finance.yahoo.com/quote/A?p=A&.tsrc=fin-srch")
print(driver.find_element(By.CSS_SELECTOR, "h1").text)
print(driver.find_element(By.XPATH, "//td[.//span[text()='Earnings Date']]//following-sibling::td[1]").text)

控制台输出:

Agilent Technologies, Inc. (A)
Aug 14, 2023 - Aug 18, 2023

注意:需要添加以下导入:

from selenium.webdriver.common.by import By

相关问题