python 为什么这个网站上的“增长估计”表没有被beautifulsoup检测到?

vkc1a9a2  于 2023-06-04  发布在  Python
关注(0)|答案(1)|浏览(162)

我试图从下面的URL中抓取数据,以使用漂亮的soup & requests从“Growth Estimates”表中获取数据,但它似乎无法拾取该表。然而,当使用检查工具时,我可以看到有一个表可以从中提取数据,我看不到任何关于它被动态提取的信息,但我可能是错的。
url = https://finance.yahoo.com/quote/AAPL/analysis?p=AAPL
有人能解释这个问题并提供解决方案吗?
谢谢你!

import requests
from bs4 import BeautifulSoup

def get_growth_data(symbol):
    url = "https://finance.yahoo.com/quote/{symbol}/analysis?p={symbol}"
    response = requests.get(url)
    soup = BeautifulSoup(response.text, "html.parser")

    # Find the table containing the growth data
    table = soup.find("table", class_="W(100%) M(0) BdB Bdc($seperatorColor) Mb(25px)")

    if table is None:
        print("Table not found.")
        return []

    # Extract the growth values from the table
    growth_values = []
    rows = table.find_all("tr")
    for row in rows:
        columns = row.find_all("td")
        if len(columns) >= 2:
            growth_values.append(columns[1].text)

    return growth_values

symbol = 'AAPL'
growth_data = get_growth_data(symbol)
print(growth_data)
5kgi1eie

5kgi1eie1#

要从服务器获得正确的响应,请在请求中设置User-Agent HTTP标头:

import pandas as pd
import requests
from bs4 import BeautifulSoup

url = 'https://finance.yahoo.com/quote/AAPL/analysis?p=AAPL'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/113.0'}
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')

table = soup.select_one('table:-soup-contains("Growth Estimates")')
df = pd.read_html(str(table))[0]

print(df)

图纸:

Growth Estimates    AAPL  Industry  Sector(s)  S&P 500
0              Current Qtr.  -0.80%       NaN        NaN      NaN
1                 Next Qtr.   5.40%       NaN        NaN      NaN
2              Current Year  -2.30%       NaN        NaN      NaN
3                 Next Year   9.90%       NaN        NaN      NaN
4  Next 5 Years (per annum)   8.02%       NaN        NaN      NaN
5  Past 5 Years (per annum)  23.64%       NaN        NaN      NaN

相关问题