debugging 我似乎不能从网页表格中生成一个框架

a1o7rhls  于 2023-11-22  发布在  其他
关注(0)|答案(1)|浏览(145)

不知道问题出在哪里,但是代码没有给出从网页中检索到的框架。这是我的第一个提取项目,我似乎无法识别问题。
这就是代码:

import requests
import sqlite3
import pandas as pd
from bs4 import BeautifulSoup
from datetime import datetime 

url = 'https://en.wikipedia.org/wiki/List_of_largest_banks#By_market_capitalization'
db_name = 'Banks.db'
table_name = 'Largest_banks'
csv_path = '/home/project/Largest_banks_data.csv'
log_file = '/home/project/code_log.txt'  
table_attribs = {'Bank name': 'Name', 'Market Cap (US$ Billion)': 'MC_USD_Billion'}

###  Task 2 - Extract process

def extract(url, table_attribs):
# Loading the webpage for scraping
html_page = requests.get(url).text

# Parse the HTML content of the webpage
data = BeautifulSoup(html_page, 'html.parser')

# Find the table with specified attributes
# Find the main table containing the relevant data
main_table = data.find('table', class_='wikitable sortable')

# Find the desired `tbody` elements within the main table
table_bodies = main_table.find_all('tbody', attrs=table_attribs)

# Extract data from each `tbody` element
extracted_data = []
for table_body in table_bodies:
    rows = table_body.find_all('tr')
    for row in rows:
        extracted_data.append([cell.text for cell in row.find_all('td')])

# Use pandas to create a DataFrame from the extracted data
df = pd.DataFrame(extracted_data, columns=list(table_attribs.values()))

return df

# Calling the extract function
df = extract(url, table_attribs)

if df is not None:
# Print the result DataFrame
    print(df)
else:
    print("Extraction failed.")

字符串

yeotifhr

yeotifhr1#

你可以直接在pandas中读取页面:

tables = pd.read_html(html_page)

字符串
这将加载3个框架,对应于页面上的3个表。

tables[0]


将打印出第一个表格(“按市值计算”)。

相关问题