csv 如何使用Python从www.example.com访问和组装数据essentialoils.org/db?

k75qkfdt  于 2023-05-20  发布在  Python
关注(0)|答案(1)|浏览(112)

我正在进行一个涉及精油的项目,需要帮助使用Python从www.example.com网站访问数据essentialoils.org/db。具体来说,我想从数据库中检索单独的数据表,并将它们组装成一个CSV文件,以供进一步分析。
我有访问www.example.com数据库的必要凭据essentialoils.org/db。我想自动获取数据并将其合并到单个文件中的过程。下面是数据集的屏幕截图,显示了when opening the link essentialoils.org/dbwhen clicking on each element separately it opens the data sheet for that particular essential oil
我的目标是编写一个Python脚本,它可以使用我的凭据登录到网站,导航到每个数据表,检索数据,并将其保存为CSV格式。我熟悉使用BeautifulSoup和Selenium等库的Web抓取技术,但我不确定如何处理身份验证过程并在网站上的多个页面中导航。
有人能指导我如何完成这项任务吗?任何建议,示例代码片段,或建议的库将不胜感激。
提前感谢您的帮助!

xsuvu9jc

xsuvu9jc1#

我想你需要处理身份验证过程,并在多个页面中导航
1.安装必要库

pip install selenium
pip install pandas

1.导入所需的库

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd

1.设置selenium WebDriver

# spedify path to the browser driver (example. chromedriver)
driver_path = 'path_to_chromedriver'
# new instance of the chrome driver
driver = webdriver.Chrome(executable_path=driver_path)

1.打开essentialoils.org/db+使用凭据登录

# open website
driver.get('https://www.essentialoils.org/db')
# wait for login form 
login_form = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID, 'login-form')))
# find username and password input fields + enter  credentials
username_input = login_form.find_element_by_name('username')
username_input.send_keys('your_username')
password_input = login_form.find_element_by_name('password')
password_input.send_keys('your_password')
# submit form
login_form.submit()

1.导航数据表+检索数据

# get links to the data sheets
links = driver.find_elements_by_xpath('//a[@class="element-link"]')
# initialize list to store data
data = []
# iterate over links + retrieve data from each data sheet
for link in links:
    # click link to open data sheet
    link.click()
    # wait for data sheet to load
    WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID, 'data-sheet')))
    # retrieve data from data sheet
    data_sheet = driver.find_element_by_id('data-sheet')
    data.append(data_sheet.text)
    # go back to previous page
    driver.back()

1.将数据保存到CSV文件

# convert data list to pandas DataFrame
df = pd.DataFrame(data)
# save to CSV file
df.to_csv('essential_oils_data.csv', index=False)
  1. close WebDriver:
driver.quit()

Ps:替换path_to_chromedriver为系统上chrome WebDriver可执行文件的实际路径+更新实际登录凭据+如果网站结构发生变化,请调整代码中的XPath/其他定位器策略(使用浏览器开发工具检查元素并找到合适的定位器)

相关问题