无法使用Selenium的多线程发送密钥

wlzqhblo  于 2023-01-17  发布在  其他
关注(0)|答案(2)|浏览(123)

我正在尝试使用 selenium 的多线程策略。简而言之,我正在尝试用id填充输入字段。
这是我的剧本:

from concurrent.futures import ThreadPoolExecutor
from selenium.webdriver.common.by import By
import numpy as np
import sys
from selenium import webdriver

def driver_setup():
    path = "geckodriver.exe"
    options = webdriver.FirefoxOptions()
    options.add_argument('--incognito')
    # options.add_argument('--headless')
    driver = webdriver.Firefox(options=options, executable_path=path)
    return driver

def fetcher(id, driver):
    print(id) #this works
    
    # this doesnt work
    driver.get(
        "https://www.roboform.com/filling-test-all-fields")
    driver.find_element(By.XPATH, '//input[@name="30_user_id"]').send_keys(id)
    time.sleep(2)
    print(i, " sent")
    #return data

def crawler(ids):
    for id in ids:
        print(i)
        results = fetcher(id, driver_setup())

drivers = [driver_setup() for _ in range(4)]

ids = list(range(0,50)) # generates ids
print(ids)
chunks = np.array_split(np.array(ids),4) #splits the id list into 4 chunks

with ThreadPoolExecutor(max_workers=4) as executor:
    bucket = executor.map(crawler, chunks)
    #results = [item for block in bucket for item in block]

[driver.quit() for driver in drivers]

除了send_keys方法外,所有的函数都可以正常工作。两个print()函数都可以正常工作,所以ID好像都被发送到了两个函数。奇怪的是,我没有收到错误消息(我得到了pycharm的进程结束,退出代码为0的通知),所以我不知道我做错了什么。
知道丢了什么吗?
我用了这个例子:https://blog.devgenius.io/multi-threaded-web-scraping-with-selenium-dbcfb0635e83如果有用的话

2ul0zpep

2ul0zpep1#

当使用threading时,注意exceptions因为它们被嵌入到futures中。例如,改变你的代码以具有下面的tweaked代码(不要改变任何其它行)

with ThreadPoolExecutor(max_workers=4) as executor:
    bucket = executor.map(crawler, chunks)
    # bucket is list of futures, so let's try to print it
    for e_buck in bucket: # simpleapp add for demo
        print(e_buck) #

你会看到你会得到异常错误,如:

  1. i未定义,请查看Crawler中的print(i, " sent")print(i)语句。
    1.一旦你修正以上的错误,下一个错误将是在发送键的id中-send_keys(id)id is of type numpy.int64.通过typecast,str(),send_keys(str(id))改变它到str
    所以你的代码在修复后会像这样:
from concurrent.futures import ThreadPoolExecutor
from selenium.webdriver.common.by import By
import numpy as np
import sys
from selenium import webdriver
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains as AC
from selenium.webdriver.common.keys import Keys
import time

def driver_setup():
    path = "geckodriver.exe"
    options = webdriver.FirefoxOptions()
    options.add_argument('--incognito')
    # options.add_argument('--headless')
    driver = webdriver.Firefox(options=options, executable_path=path)
    return driver

def fetcher(id, driver):
    print(id) #this works
    
    # this doesnt work - it will work now :)
    driver.get(
        "https://www.roboform.com/filling-test-all-fields")
    driver.find_element(By.XPATH, '//input[@name="30_user_id"]').send_keys(str(id))
    time.sleep(2)
    print(id, " sent")
    #return data

def crawler(ids):
    for id in ids:
        print(id)
        results = fetcher(id, driver_setup())

#drivers = [driver_setup() for _ in range(4)]

ids = list(range(0,50)) # generates ids
print(ids)
chunks = np.array_split(np.array(ids),4) #splits the id list into 4 chunks

with ThreadPoolExecutor(max_workers=4) as executor:
    bucket = executor.map(crawler, chunks)
    # bucket is list of futures, so let's try to print it
    for e_buck in bucket: # simpleapp add for demo
        print(e_buck) # check what print, you get, first time you will get that
        # i is not defined, look at this statment print(i, " sent") and print(i) in crawler. 
        # once you fix the above error, next error will be in id in send keys- send_keys(id), id is of type ''numpy.int64''. change it to str by typecast, str(), send_keys(str(id))
    #results = [item for block in bucket for item in block]

#[driver.quit() for driver in drivers]
myzjeezk

myzjeezk2#

可能您试图过早调用send_keys(),甚至在<input>字段完全具有rendered之前。
溶液
理想情况下,要向元素发送 * 字符序列 *,需要为element_to_be_clickable()引入WebDriverWait,可以使用以下locator strategies之一:

  • 使用 * 名称 *:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.NAME, "30_user_id"))).send_keys(id)
  • 使用 * CSS选择器 *:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input[name='30_user_id']"))).send_keys(id)
  • 使用 * XPATH *:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[@name='30_user_id']"))).send_keys(id)
      • 注意**:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

相关问题