我正在运行此代码太网页抓取和下载5从谷歌的图像:应该发生的是,在我运行代码后,Chrome浏览器应该出现,代码应该导致鼠标点击图像,然后下载它,然后向下滚动到另一个图像,鼠标点击并下载它,等等,最多五次。在这段代码中,Chrome浏览器出现了一秒钟,关闭了,没有其他事情发生......尽管代码没有抛出任何Python错误。
我使用Blender 3D建模软件作为我的IDE,因为我希望在未来使用Python代码制作Blender插件(Blender插件有点像Google Chrome中的扩展,它是一个小软件,您可以安装到Blender中以增加其功能)。这就是为什么在我的代码顶部有额外的导入行的原因……
一个相关的项目是,我得到这个警告:
E:\GLOBAL ASSETS\SCRIPTING\Web Scraping Images\web-scraper.blend\web-scraper.py:21:弃用警告:executable_path已过时,请传入服务对象
这是在我运行代码后,控制台中唯一的其他信息:
DevTools在ws://www.example.com上侦听127.0.0.1:52643/devtools/browser/ea448f70-0066-4d50-bfb8-8671528789b8
任何帮助都将不胜感激
import bpy
import subprocess
import sys
import os
import cv2
import random
from random import randrange
from PIL import Image #make sure both pil from c:\users\mjoe6\appdata\local\packages\pythonsoftwarefoundation.python.3.10_qbz5n2kfra8p0\localcache\local-packages\python310\site-packages is in blender pip3.exe folder
from selenium import webdriver
from selenium.webdriver.common.by import By
import requests
import io
import time
# path to python.exe
python_exe = os.path.join(sys.prefix, 'bin', 'python.exe')
py_lib = os.path.join(sys.prefix, 'lib', 'site-packages','pip')
PATH = "E:\\GLOBAL ASSETS\\SCRIPTING\\Web Scraping Images\\chromedriver.exe"
wd = webdriver.Chrome(PATH)
def get_images_from_google(wd, delay, max_images):
def scroll_down(wd):
wd.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(delay)
url = "https://www.google.com/search?q=cats+2019+IMDb&sxsrf=ALiCzsZmBIp-JZmZv23v6ORoc0VL2NRuxg:1654543304286&source=lnms&tbm=isch&sa=X&ved=2ahUKEwiRjquPxpn4AhUymo4IHbC6Cx0Q_AUoAnoECAEQBA&biw=1536&bih=714&dpr=1.25#imgrc=ixH-uoDQFN_gpM"
wd.get(url)
image_urls = set()
while len (image_urls) < max_images:
scroll_down(wd)
thumbnails = wd.find_elements(By.CLASS_NAME, "Q4LuWd")
for img in thumbnails[len(image_urls):max_images]:
try:
img.click()
time.sleep.delay
except:
continue
images = wd.find_elements(By.CLASS_NAME, "n3VNCb")
for image in images:
if image.get_attribute('src') and 'http' in image.get_attribute('src'):
image_urls.add(image.get_attribute('src'))
print(f"Found {len (image_urls)}")
return image_urls
def download_image(download_path, url, file_name):
try:
image_content = requests.get(url).content
image_file = io.BytesIO(image_content)
image = Image.open(image_file)
file_path = download_path + file_name
with open(file_path, "wb") as f:
image.save(f, "PNG")
print("Success")
except Exception as e:
print('FAILED -', e)
urls = get_images_from_google(wd, 1, 5)
print(urls)
wd.quit()
1条答案
按热度按时间hc2pp10m1#
错误可能在这些行中,
应该像
也在这一行
应该将http改为https,