python-3.x 尝试擦除特定表,但未获得结果

2uluyalo  于 2023-01-10  发布在  Python
关注(0)|答案(1)|浏览(89)

我尝试了三种不同的技术来抓取一个名为'table-light'的表,但实际上没有任何效果。下面的代码显示了我提取数据的尝试。

import pandas as pd
tables = pd.read_html('https://finviz.com/groups.ashx?g=industry&v=120&o=marketcap')
tables

############################################################################

import requests
import pandas as pd
url = 'https://finviz.com/groups.ashx?g=industry&v=120&o=marketcap'
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[10]
print(df)

############################################################################

import requests
from bs4 import BeautifulSoup
url = "https://finviz.com/groups.ashx?g=industry&v=120&o=marketcap"
r = requests.get(url)
html = r.text
soup = BeautifulSoup(html, "html.parser")
table = soup.find_all('table-light')
print(table)

我试图从中提取数据的表名为'table-light'。我想获取所有列和所有144行。我该怎么做呢?

ssgvzors

ssgvzors1#

您可以尝试设置User-Agent header以获得正确的HTML(而不是验证码页面):

import pandas as pd
import requests
from bs4 import BeautifulSoup

url = "https://finviz.com/groups.ashx?g=industry&v=120&o=marketcap"

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:108.0) Gecko/20100101 Firefox/108.0"
}
soup = BeautifulSoup(requests.get(url, headers=headers).content, "lxml") # <-- don't use html.parser here

table = soup.select_one(".table-light")
for td in table.tr.select('td'):
    td.name = 'th'

df = pd.read_html(str(table))[0]
print(df.head())

图纸:

No.                            Name Market Cap    P/E  Fwd P/E   PEG   P/S   P/B    P/C  P/FCF EPS past 5Y EPS next 5Y Sales past 5Y Change   Volume
0    1       Real Estate - Development      3.14B   3.21    21.12  0.24  0.60  0.52   2.28  17.11      43.30%      13.42%        13.69%  1.43%  715.95K
1    2           Textile Manufacturing      3.42B  32.58    25.04     -  1.43  2.58   9.88  90.16      15.31%      -0.49%         3.54%  1.83%  212.71K
2    3                     Coking Coal      5.31B   2.50     4.93  0.37  0.64  1.53   4.20   2.54      38.39%       6.67%        22.92%  5.43%    1.92M
3    4       Real Estate - Diversified      6.71B  17.38   278.89  0.87  2.78  1.51  15.09  91.64       0.48%      19.91%        11.97%  3.31%  461.33K
4    5  Other Precious Metals & Mining      8.10B  24.91    29.07  2.71  6.52  1.06  14.47  97.98      16.30%       9.19%        20.71%  0.23%    4.77M

相关问题