Chrome 我的Python代码,以网页抓取和下载五个图像是不工作,使用Blender(3D)作为IDE

58wvjzkj  于 12个月前  发布在  Go
关注(0)|答案(1)|浏览(86)

我正在运行此代码太网页抓取和下载5从谷歌的图像:应该发生的是,在我运行代码后,Chrome浏览器应该出现,代码应该导致鼠标点击图像,然后下载它,然后向下滚动到另一个图像,鼠标点击并下载它,等等,最多五次。在这段代码中,Chrome浏览器出现了一秒钟,关闭了,没有其他事情发生......尽管代码没有抛出任何Python错误。
我使用Blender 3D建模软件作为我的IDE,因为我希望在未来使用Python代码制作Blender插件(Blender插件有点像Google Chrome中的扩展,它是一个小软件,您可以安装到Blender中以增加其功能)。这就是为什么在我的代码顶部有额外的导入行的原因……
一个相关的项目是,我得到这个警告:
E:\GLOBAL ASSETS\SCRIPTING\Web Scraping Images\web-scraper.blend\web-scraper.py:21:弃用警告:executable_path已过时,请传入服务对象
这是在我运行代码后,控制台中唯一的其他信息:
DevTools在ws://www.example.com上侦听127.0.0.1:52643/devtools/browser/ea448f70-0066-4d50-bfb8-8671528789b8
任何帮助都将不胜感激

import bpy
import subprocess
import sys
import os
import cv2
import random
from random import randrange
from PIL import Image #make sure both pil from c:\users\mjoe6\appdata\local\packages\pythonsoftwarefoundation.python.3.10_qbz5n2kfra8p0\localcache\local-packages\python310\site-packages is in blender pip3.exe folder
from selenium import webdriver
from selenium.webdriver.common.by import By
import requests
import io
import time

# path to python.exe
python_exe = os.path.join(sys.prefix, 'bin', 'python.exe')
py_lib = os.path.join(sys.prefix, 'lib', 'site-packages','pip')

PATH = "E:\\GLOBAL ASSETS\\SCRIPTING\\Web Scraping Images\\chromedriver.exe"
wd = webdriver.Chrome(PATH)

def get_images_from_google(wd, delay, max_images):
    def scroll_down(wd):
        wd.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        time.sleep(delay)
        
        url = "https://www.google.com/search?q=cats+2019+IMDb&sxsrf=ALiCzsZmBIp-JZmZv23v6ORoc0VL2NRuxg:1654543304286&source=lnms&tbm=isch&sa=X&ved=2ahUKEwiRjquPxpn4AhUymo4IHbC6Cx0Q_AUoAnoECAEQBA&biw=1536&bih=714&dpr=1.25#imgrc=ixH-uoDQFN_gpM"
        wd.get(url)
        
        image_urls = set()
        while len (image_urls) < max_images:
            scroll_down(wd)
            thumbnails = wd.find_elements(By.CLASS_NAME, "Q4LuWd")
            for img in thumbnails[len(image_urls):max_images]:
                try:
                    img.click()
                    time.sleep.delay
                except:
                    continue    
                images = wd.find_elements(By.CLASS_NAME, "n3VNCb")
                for image in images:
                    if image.get_attribute('src') and 'http' in image.get_attribute('src'):
                        image_urls.add(image.get_attribute('src'))
                        print(f"Found {len (image_urls)}")
        return image_urls
        
def download_image(download_path, url, file_name):
    try:
        image_content = requests.get(url).content
        image_file = io.BytesIO(image_content)
        image = Image.open(image_file)
        file_path = download_path + file_name
        with open(file_path, "wb") as f:
            image.save(f, "PNG")
        
        print("Success")
    except Exception as e:
        print('FAILED -', e)    

urls = get_images_from_google(wd, 1, 5)
print(urls)
wd.quit()
hc2pp10m

hc2pp10m1#

错误可能在这些行中,

images = wd.find_elements(By.CLASS_NAME, "n3VNCb")

应该像

images = wd.find_elements(By.CLASS_NAME, "iPVvYb")

也在这一行

if image.get_attribute('src') and 'http' in image.get_attribute('src'):

应该将http改为https,

if image.get_attribute('src') and 'http' in image.get_attribute('src'):

相关问题