Python selenium只是在整个循环中多次截图第一个元素

vwkv1x7d  于 2022-12-20  发布在  Python
关注(0)|答案(2)|浏览(136)

我试着用selenium python对一个reddit帖子的每条评论截图,所有评论都有相同的id/class,这就是我用来选择它们的方法。
这是我的代码

import requests
from bs4 import BeautifulSoup
import pyttsx3, pyautogui

from PIL import Image
from io import BytesIO

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.keys import Keys

driver = webdriver.Chrome(executable_path='C:\Selenium_Drivers\chromedriver.exe')

url = 'https://www.reddit.com/user/UoPeople09/comments/wlt4qj/what_made_you_apply_at_uopeople/'

driver.get(url)
driver.implicitly_wait(5)

total_height = int(driver.execute_script("return document.body.scrollHeight"))

u = 1
for i in range(1, total_height*2, 50):
    driver.execute_script(f"window.scrollTo(0, {i})")
 
    comment = driver.find_element(By.CSS_SELECTOR, 'div#t1_ikllxsq._3sf33-9rVAO_v4y0pIW_CH')
    comment.screenshot(f'E:\WEB SCRAPING PROJECTS\PROJECTS\Reddit Scraping\shot{u}.png')
    u += 1

我的代码向下滚动页面并将截图保存在我想要的路径中,但问题是所有的截图都是reddit帖子中的第一个元素(评论)。
我想我的代码来保存每个评论单独截图。需要帮助

dced5bon

dced5bon1#

要获取每条评论的屏幕截图,您需要确定评论元素,然后滚动到每条评论,然后进行屏幕截图。
这种方法对我很有效。

url='https://www.reddit.com/user/UoPeople09/comments/wlt4qj/what_made_you_apply_at_uopeople/'
driver.get(url)
#disabled coockie button
WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"//button[contains(.,'Reject non-essential')]"))).click()
#Get all the comments
comments = driver.find_elements(By.CSS_SELECTOR, "[data-testid='comment_author_link']")
print(len(comments))

for i in range(len(comments)):
    #Scroll to each comment
    comments[i].location_once_scrolled_into_view
    time.sleep(2)# slowdown the scripts to take the screenshot
    driver.save_screenshot(f'E:\WEB SCRAPING PROJECTS\PROJECTS\Reddit Scraping\shot{i+1}.png')

注意:您拥有所有库,只需要import time库。

6rvt4ljy

6rvt4ljy2#

这里你有一个例子,包括滚动到页面的结尾:

# Needed libs
from selenium.webdriver import ActionChains, Keys
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium import webdriver

# Initialize drivver and navigate
driver = webdriver.Chrome()
driver.maximize_window()
url = 'https://www.reddit.com/user/UoPeople09/comments/wlt4qj/what_made_you_apply_at_uopeople/'
wait = WebDriverWait(driver, 5)
driver.get(url)

# Wait for reject cookies button and push on it
reject_cookies_button = wait.until(EC.presence_of_element_located((By.XPATH, f"(//section[@class='_2BNSty-Ld4uppTeWGfEe8r']//button)[2]")))
reject_cookies_button.click()

# Make scroll till the end of the page
while True:
    high_before_scroll = driver.execute_script('return document.body.scrollHeight')
    driver.execute_script('window.scrollTo(100, document.body.scrollHeight);')
    time.sleep(2)
    if driver.execute_script('return document.body.scrollHeight') == high_before_scroll:
        break

# We take how many comments we have
comments = wait.until(EC.presence_of_all_elements_located((By.XPATH, f"//div[contains(@class, 'Comment')]")))

# We take an screenshot for every comment and we save it
u = 1
for comment in comments:
    driver.execute_script("arguments[0].scrollIntoView();", comment)
    comment.screenshot(f'./shot{u}.png')
    u += 1

我希望代码中的注解能帮助您理解所发生的事情
我的代码是针对Linux编写的,但是只需使用您的Linux chromedriver初始化驱动程序即可

相关问题