当我尝试从元素容器获取innerText时,Selenium WebDriver崩溃

lf5gs5x2  于 2023-02-08  发布在  其他
关注(0)|答案(2)|浏览(130)

我正试图从所有的message.spoilers-container中获取innertext,但是当我向上滚动网页时,程序崩溃了,并给了我一个错误。
代码:

from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys

def find_message_container(driver):
    try:
        elements = driver.execute_script("return document.querySelectorAll('.message.spoilers-container')")
        unique_texts = set()

        for element in elements:
            text = element.get_attribute("innerText")

            if text not in unique_texts:
                unique_texts.add(text)

            with open("unique_texts.txt", "w") as file:
                for text in unique_texts:
                    file.write("\n" + text + "\n")

    except NoSuchElementException as e:
        print('Could not find the given element container. The following exception was raised:\n', e)
        pass
    
    return unique_texts

错误:

Traceback (most recent call last):
  File "c:\~\Desktop\Project\file.py", line 11, in find_message_container
    text = element.get_attribute("innerText")
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\~\AppData\Local\Programs\Python\Python311\Lib\site-packages\selenium\webdriver\remote\webelement.py", line 179, in get_attribute   
    attribute_value = self.parent.execute_script(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\~\AppData\Local\Programs\Python\Python311\Lib\site-packages\selenium\webdriver\remote\webdriver.py", line 506, in execute_script   
    return self.execute(command, {"script": script, "args": converted_args})["value"]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\~\AppData\Local\Programs\Python\Python311\Lib\site-packages\selenium\webdriver\remote\webdriver.py", line 444, in execute
    self.error_handler.check_response(response)
  File "C:\~\AppData\Local\Programs\Python\Python311\Lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 249, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
  (Session info: chrome=109.0.5414.120)

是什么导致了这个问题?我正在测试的网站是Web Telegram。每次通过向上滚动加载新聊天时,都会出现一个新容器。
任何帮助都会很有帮助,我尝试了一些等待语句和wait.until,但没有效果。

i34xakig

i34xakig1#

我没有Web Telegram帐户,所以我无法测试,但我会改变这些事情:
1.主要问题是StaleElementReferenceException。陈旧元素是指您分配给变量的元素,页面发生了变化,然后您试图对该元素执行.click().text。一旦页面发生变化,您拥有的引用就消失了......它现在什么也不指向。下面是一个快速代码示例,说明如何发生这种情况

element = driver.find_element(locator) # got a reference
# while doing stuff, page changes
value = element.text # accessing the element using .text throws the exception

要避免这种情况,您需要在访问引用之前重新获取它

element = driver.find_element(locator)
# while doing stuff, page changes
element = driver.find_element(locator) # refetch the element
value = element.text

在您的例子中,这是由于消息循环而发生的。您在循环之前创建了列表,因此如果元素在循环过程中发生变化,则会抛出异常。修复此问题的方法是在循环中重新获取元素。

for element in driver.find_elements(...)

一个潜在的大问题是,如果你在一个快速移动的聊天室里,不断有大量的新消息,你的脚本可能无法跟上,因为页面DOM似乎会随着每个新消息而改变。这是根据您的评论做出的假设。
1.首选本机API而不是使用driver.execute_script()查找元素。

elements = driver.execute_script("return document.querySelectorAll('.message.spoilers-container')")

elements = driver.find_element(By.CSS_SELECTOR('.message.spoilers-container'))

1.使用.text代替.get_attribute("innerText")。替换

text = element.get_attribute("innerText")

text = element.text

1.写一个文件是一个相对慢的操作。我会避免写直到循环完成。
1.如果已经将它们写入文件,为什么还要返回unique_texts
下面是我根据这些建议重写的代码

def find_message_container(driver):
    try:
        unique_texts = set()
        for element in driver.find_elements(By.CSS_SELECTOR('.message.spoilers-container')):
            message = element.text
            if message not in unique_texts:
                unique_texts.add(message)

        with open("unique_texts.txt", "w") as file:
            for text in unique_texts:
                file.write("\n" + text + "\n")

    except NoSuchElementException as e:
        print('Could not find the given element container. The following exception was raised:\n', e)
        pass
cgyqldqp

cgyqldqp2#

核心异常是StaleElementReferenceException ...

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document

......这意味着当document.querySelectorAll()完成获取文档中所有匹配元素的 * NodeList * 时,一些元素会随着新聊天被加载到新容器中而失效。
溶液
一种可能的解决方案是引发WebDriverWait等待visibility_of_all_elements_located(),您可以使用以下locator strategies之一:

  • 使用 * CSS选择器 *:
elements = WebDriverWait(browser, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".message.spoilers-container")))
  • 使用 * XPATH *:
elements = WebDriverWait(browser, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//*[@class='message spoilers-container']")))
      • 注意**:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

相关问题