selenium 如何找到页面上加载的最后一个元素以在WebDriverWait中使用,从而尽可能可靠地加载整个页面?

thtygnil  于 2023-01-17  发布在  其他
关注(0)|答案(2)|浏览(213)

此页面具有动态加载功能,在不同时间加载不同元素:
https://www.globo.com/
我使用了一个元素,我注意到它比其他元素花费的时间要长一些:

WebDriverWait(driver, 30).until(
    EC.element_to_be_clickable(
        (By.XPATH, "//div[contains(@class,'tooltip-vitrine')]")

但我想知道是否有任何方法可以跟踪元素加载到页面上的顺序,以找到一种模式,并使用一个总是比其他元素花费更长时间的元素,从而对页面的完整加载有更大的信心。

yr9zkbsy

yr9zkbsy1#

尝试this解决方案,您必须检查document.readyState并等待complete返回。

ktca8awb

ktca8awb2#

我不知道是否有一种方法可以检查页面是否完全加载,因此我不知道是否有一种可靠的方法可以找到最后加载的元素。
一个简单的方法是在页面加载时检查页面中的元素数量:该数字应该增加,并且当页面完全加载时停止。
(注意,添加关于document.readyState的部分是为了检查Roman J的答案是否有效,但它似乎不起作用,因为即使接下来加载新元素,它也会打印complete

driver.get('https://globo.com')
lists_of_elements = [[]]
time_old = time.time()
# maximum waiting time in seconds
max_wait = 50

while 1:
    # find all elements in the page
    elements = driver.find_elements(By.XPATH, '//*')
    time_new = time.time()
    
    # compare the number of elements between the new list and the previous list
    if len(elements) != len(lists_of_elements[-1]):
        print(f'loaded elements: {len(elements)} - doc state: {driver.execute_script("return document.readyState")}')
        lists_of_elements.append(elements)
        time_old = time_new
        
    if time_new - time_old > max_wait:
        print('page seems to be fully loaded')
        break

产出

loaded elements: 3053 - doc state: complete
loaded elements: 3054 - doc state: complete
loaded elements: 3153 - doc state: complete
loaded elements: 3152 - doc state: complete
loaded elements: 3156 - doc state: complete
loaded elements: 3160 - doc state: complete
page seems to be fully loaded

然后运行下面的命令来查看哪些是最后加载的元素(即它们的html代码

# compute the difference between the last two lists
last_loaded_elements = list(set(lists_of_elements[-1]) - set(lists_of_elements[-2]))
for idx, el in enumerate(last_loaded_elements):
    print(f"element {idx}\n{el.get_attribute('outerHTML')}\n")

产出

element 0
<link rel="preload" href="https://adservice.google.it/adsid/integrator.js?domain=www.globo.com" as="script">

element 1
<script type="text/javascript" src="https://adservice.google.com/adsid/integrator.js?domain=www.globo.com"></script>

element 2
<iframe frameborder="0" src="https://a3b68e638f6dccabe7e288ddc2ab6c43.safeframe.googlesyndication.com/safeframe/1-0-40/html/container.html" id="google_ads_iframe_/95377733/tvg_Globo.com.Home_0" title="3rd party ad content" name="" scrolling="no" marginwidth="0" marginheight="0" width="970" height="250" data-is-safeframe="true" sandbox="allow-forms allow-popups allow-popups-to-escape-sandbox allow-same-origin allow-scripts allow-top-navigation-by-user-activation" role="region" aria-label="Advertisement" tabindex="0" data-google-container-id="1" style="border: 0px; vertical-align: bottom;" data-load-complete="true"></iframe>

element 3
<script type="text/javascript" src="https://adservice.google.it/adsid/integrator.js?domain=www.globo.com"></script>

element 4
<link rel="preload" href="https://adservice.google.com/adsid/integrator.js?domain=www.globo.com" as="script">

element 5
<div id="google_ads_iframe_/95377733/tvg_Globo.com.Home_0__container__" style="border: 0pt none; margin: auto; text-align: center; width: 970px; height: 250px;"><iframe frameborder="0" src="https://a3b68e638f6dccabe7e288ddc2ab6c43.safeframe.googlesyndication.com/safeframe/1-0-40/html/container.html" id="google_ads_iframe_/95377733/tvg_Globo.com.Home_0" title="3rd party ad content" name="" scrolling="no" marginwidth="0" marginheight="0" width="970" height="250" data-is-safeframe="true" sandbox="allow-forms allow-popups allow-popups-to-escape-sandbox allow-same-origin allow-scripts allow-top-navigation-by-user-activation" role="region" aria-label="Advertisement" tabindex="0" data-google-container-id="1" style="border: 0px; vertical-align: bottom;" data-load-complete="true"></iframe></div>

相关问题