我试图用Python
编写一个脚本,将Product Attributes
表导出为Excel文件(或CSV)从下面的URL。
我写了一个脚本,尝试了一个不同的类名,但我遇到了一个错误!
网址:https://www.digikey.com/en/products/detail/texas-instruments/uln2003aidre4/1912622
我不知道这个消息的原因是什么,因为我可以从不同的网站导出表,但我的代码在这个网站崩溃。(还有Mouser.com)
我有一个理论,我认为这两个网站正在阻止我的脚本,以避免导出他们的数据,但我不确定。
- 我要导出的表及其检验 * x1c 0d1x
下面是我的代码:
import time
import requests
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import pandas as pd
def get_specifications_table(url):
options = Options()
options.add_argument('--headless') # Run the browser in headless mode (no visible window)
driver = webdriver.Chrome(options=options)
driver.get(url)
time.sleep(5) # Add a delay to allow the webpage to load (adjust the time as needed)
try:
# Find the element with the specified class name "MuiTable-root css-u6unfi" and extract the table
class_name = "MuiTable-root.css-u6unfi"
table_element = driver.find_element("css selector", f".{class_name}")
table_html = table_element.get_attribute('outerHTML')
df = pd.read_html(table_html)[0]
return df
except Exception as e:
print("Error:", e)
finally:
driver.quit()
return None
def export_to_excel(df, output_file):
writer = pd.ExcelWriter(output_file, engine='xlsxwriter')
df.to_excel(writer, index=False)
writer.save()
writer.close()
if __name__ == '__main__':
url = "https://www.digikey.com/en/products/detail/texas-instruments/uln2003aidre4/1912622"
output_excel_file = "Specifications_Table_Digikey.xlsx"
print("Fetching the webpage and extracting the table...")
specifications_df = get_specifications_table(url)
if specifications_df is not None:
print("Exporting the table to Excel...")
export_to_excel(specifications_df, output_excel_file)
print(f"Table 'Specifications' exported to '{output_excel_file}' successfully.")
else:
print("Table extraction or export failed.")
字符串
但我面临这个错误:
Fetching the webpage and extracting the table...
Error: Message: no such element: Unable to locate element: {"method":"css selector","selector":".MuiTable-root.css-u6unfi"}
(Session info: headless chrome=115.0.5790.110); For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors#no-such-element-exception
Stacktrace:
Backtrace:
GetHandleVerifier [0x004BA813+48355]
(No symbol) [0x0044C4B1]
(No symbol) [0x00355358]
(No symbol) [0x003809A5]
(No symbol) [0x00380B3B]
(No symbol) [0x003AE232]
(No symbol) [0x0039A784]
(No symbol) [0x003AC922]
(No symbol) [0x0039A536]
(No symbol) [0x003782DC]
(No symbol) [0x003793DD]
GetHandleVerifier [0x0071AABD+2539405]
GetHandleVerifier [0x0075A78F+2800735]
GetHandleVerifier [0x0075456C+2775612]
GetHandleVerifier [0x005451E0+616112]
(No symbol) [0x00455F8C]
(No symbol) [0x00452328]
(No symbol) [0x0045240B]
(No symbol) [0x00444FF7]
BaseThreadInitThunk [0x772500C9+25]
RtlGetAppContainerNamedObjectPath [0x77BC7B4E+286]
RtlGetAppContainerNamedObjectPath [0x77BC7B1E+238]
Table extraction or export failed.
型
2条答案
按热度按时间siv3szwd1#
1.检查
driver.page_source
以了解发生了什么:字符串
1.根据该信息设置
user-agent
以避免阻塞:型
1.在这种情况下,直接通过
pandas.read_html()
和特定属性选择元素:型
示例
型
输出
| | Type | Description |
| --|--| ------------ |
| 分类|集成电路(IC)电源管理(PMIC)配电开关、负载驱动器| Integrated Circuits (ICs)Power Management (PMIC)Power Distribution Switches, Load Drivers |
| 制造商|德州仪器| Texas Instruments |
| 系列|ULx200xA| ULx200xA |
| Package |胶带和卷轴(TR)| Tape & Reel (TR) |
| 产品状态|Digi-Key已停产| Discontinued at Digi-Key |
| 开关类型|继电器,电磁阀驱动器| Relay, Solenoid Driver |
| 输出数量|七| 7 |
| 比率-输入:输出|一比一| 1:1 |
| 输出配置|低端| Low Side |
| |||
35g0bw712#
要从网站ULN2003AIDRE4 Texas Instruments | Integrated Circuits (ICs) | DigiKey的 * 产品属性 * 表中抓取数据,您需要为
<table>
元素的visibility_of_element_located()导出WebDriverWait,并使用DataFrame从Pandas,您可以使用以下locator strategy:代码块:
字符串
控制台输出:
型
引用
您可以在以下内容中找到一些相关的详细讨论: