selenium 每次选择“li”选项时都会更改的数据--Python Selple

kognpnkq  于 2022-11-10  发布在  Python
关注(0)|答案(1)|浏览(138)

我正在尝试从站点https://www.pais.co.il/info/Thank-to.aspx中获取数据(忽略希伯来语)。
我需要从第一个下拉菜单中单击其中任何一个选项

点击那个按钮

然后把这些数字

我知道如何抓取数字/点击或选择按钮,但我不知道如何从那个奇怪的下拉菜单中反复选择每个选项……
我确实尝试单击该按钮打开下拉菜单,作为互联网上的一些建议,但无法这样做。

button1 = driver.find_element_by_xpath('/html/body/form/div[3]/div[1]/div/div/div[1]/select')

但我收到错误:消息:没有这样的元素:无法找到元素
我希望您能为网络报废领域的新手提供帮助:)

osh3o9ms

osh3o9ms1#

您需要的数据是用js加载的,因此您可以使用Selify来获取城市列表。这里有一个可能的解决方案:

import csv
import requests
from typing import Union, Any
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

def get_data(url: str, city_name: str) -> Union[dict[str, Any], str]:
    payload = {
    'city': city_name,
    'mainCategory': 'בחר תחום',
    'secondCategory': 'בחר תת תחום'
    }
    headers = {
        'User-Agent': 'Mozilla/5.0'
    }
    try:
        r =  requests.post(url, data=payload, headers=headers).json()
        return {
                "City Name": city_name,
                "Ventures": r[0],
                "Realizable Investments": r[1],
                "Realized Investments": r[2],
                "Amount Invested Since 1989": r[3]
            }
    except ValueError:
        return f'No data for {city_name}'

def save_to_csv(data: list) -> None:
    with open(file='pais.csv', mode='a', encoding="utf-8") as f:
        writer = csv.writer(f, lineterminator='\n')
        writer.writerow([*data])

options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_experimental_option("excludeSwitches", ["enable-automation", "enable-logging"])
service = Service(executable_path="path/to/your/chromedriver.exe")
driver = webdriver.Chrome(service=service, options=options)
wait = WebDriverWait(driver, 15)

main_url = 'https://www.pais.co.il/info/Thank-to.aspx'
post_call_url = 'https://www.pais.co.il/grants/grantsRequestNumbers.ashx'

driver.get(main_url)
wait.until(EC.frame_to_be_available_and_switch_to_it((By.TAG_NAME, "iframe")))
cities = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '#FacilitiesStats_ddlcity>option')))
city_names = [city.text for city in cities[1:]]

for name in city_names:
    data = get_data(post_call_url, name)
    if isinstance(data, dict):
        save_to_csv(data.values())
    else:
        print(data)

driver.quit

对于某些城市,没有数据,例如:"בוסתאן אל-מרג",因此我们只打印到控制台No data for בוסתאן אל-מרג
输出CSV文件pais.csv

אבו גוש,19,6117232,14813422,20930654
אבו סנאן,29,6517560,16225629,22743189
אבן יהודה,28,3945008,13107701,17052709
אום אל-פחם,76,56738614,200980004,257718618
אופקים,109,21988456,130339851,152328307

已在Python 3.9.10上测试。使用Selenium 4.5.0requests 2.28.1
当然,我们可以只使用Selify而不使用requests库来获得所需的数据。但在测试了这个解决方案之后,在我看来它似乎更快了。因为在发出POST请求时,我们会立即获得所需的值,而要使用Selify从标记(div.counter)接收数据,我们必须等待计数器动画完成
例如,您还可以使用ThreadPoolExecutor,这样获取和保存数据的过程将会快得多。这里有一个可能的解决方案:

import csv
import requests
from itertools import repeat
from typing import Union, Any
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from concurrent.futures import ThreadPoolExecutor

def get_data(url: str, city_name: str) -> Union[dict[str, Any], str]:
    payload = {
    'city': city_name,
    'mainCategory': 'בחר תחום',
    'secondCategory': 'בחר תת תחום'
    }
    headers = {
        'User-Agent': 'Mozilla/5.0'
    }
    try:
        r =  requests.post(url, data=payload, headers=headers).json()
        return {
                "City Name": city_name,
                "Ventures": r[0],
                "Realizable Investments": r[1],
                "Realized Investments": r[2],
                "Amount Invested Since 1989": r[3]
            }
    except ValueError:
        return f'No data for {city_name}'

def save_to_csv(data: Union[dict, str]) -> None:
    if isinstance(data, dict):
        with open(file='pais.csv', mode='a', encoding="utf-8") as f:
            writer = csv.writer(f, lineterminator='\n')
            writer.writerow([*data.values()])
    else:
        print(data)

options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_experimental_option("excludeSwitches", ["enable-automation", "enable-logging"])
service = Service(executable_path="path/to/your/chromedriver.exe")
driver = webdriver.Chrome(service=service, options=options)
wait = WebDriverWait(driver, 15)

main_url = 'https://www.pais.co.il/info/Thank-to.aspx'
post_call_url = 'https://www.pais.co.il/grants/grantsRequestNumbers.ashx'

driver.get(main_url)
wait.until(EC.frame_to_be_available_and_switch_to_it((By.TAG_NAME, "iframe")))
cities = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '#FacilitiesStats_ddlcity>option')))
city_names = [city.text for city in cities[1:]]

with ThreadPoolExecutor() as executor:
    data = executor.map(get_data, repeat(post_call_url), city_names)
    executor.map(save_to_csv, data)

driver.quit

相关问题