如何在selenium chromedriver python中设置带身份验证代理?

clj7thdc  于 2023-01-30  发布在  Python
关注(0)|答案(9)|浏览(306)

我正在创建一个脚本,抓取一个网站收集一些数据,但问题是,他们阻止了我后,太多的请求,但使用代理,我可以发送更多的请求,然后目前我这样做。我已经集成了代理与 chrome 选项--proxy-server
options.add_argument('--proxy-server={}'.format('http://ip:port'))
但我使用的是付费代理,所以它需要身份验证,并作为下面的屏幕截图,它给出了用户名和密码的警报框


然后我尝试使用它与用户名和密码
options.add_argument('--proxy-server={}'.format('http://username:password@ip:port'))
但它似乎也不起作用。我正在寻找一个解决方案,并找到了下面的解决方案,我用它与 chrome 扩展proxy auto auth和没有 chrome 扩展

proxy = {'address': settings.PROXY,
             'username': settings.PROXY_USER,
             'password': settings.PROXY_PASSWORD}

capabilities = dict(DesiredCapabilities.CHROME)
capabilities['proxy'] = {'proxyType': 'MANUAL',
                             'httpProxy': proxy['address'],
                             'ftpProxy': proxy['address'],
                             'sslProxy': proxy['address'],
                             'noProxy': '',
                             'class': "org.openqa.selenium.Proxy",
                             'autodetect': False,
                             'socksUsername': proxy['username'],
                             'socksPassword': proxy['password']}
options.add_extension(os.path.join(settings.DIR, "extension_2_0.crx")) # proxy auth extension

但上述两个都不能正常工作,它似乎工作,因为经过上述代码的代理身份验证警报消失,当我检查我的IP通过谷歌搜索什么是我的IP,并确认这是不工作。
请任何人谁可以帮助我验证chromedriver上的代理服务器.

iqjalb3h

iqjalb3h1#

** selenium chrome 代理验证**

  • 使用Python设置带有Selenium的chromedriver代理 *

如果你需要在python中使用代理,在chromedriver中使用Selenium库,你通常使用下面的代码(不需要任何用户名和密码:

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--proxy-server=%s' % hostname + ":" + port)
driver = webdriver.Chrome(chrome_options=chrome_options)

它工作正常,除非代理要求认证。如果代理要求您登录用户名和密码,它将不工作。在这种情况下,你必须使用更棘手的解决方案,解释如下。顺便说一句,如果你白名单您的服务器IP地址从代理提供商或服务器,它不应该问代理凭据。

  • 使用Selenium中的Chromedriver进行HTTP代理验证 *

要设置代理认证,我们需要生成一个特殊的文件,并使用下面的代码将其动态上传到chromedriver。这段代码将selenium与chromedriver配置为使用HTTP代理,该代理需要使用用户/密码对进行认证。

import os
import zipfile

from selenium import webdriver

PROXY_HOST = '192.168.3.2'  # rotating proxy or host
PROXY_PORT = 8080 # port
PROXY_USER = 'proxy-user' # username
PROXY_PASS = 'proxy-password' # password

manifest_json = """
{
    "version": "1.0.0",
    "manifest_version": 2,
    "name": "Chrome Proxy",
    "permissions": [
        "proxy",
        "tabs",
        "unlimitedStorage",
        "storage",
        "<all_urls>",
        "webRequest",
        "webRequestBlocking"
    ],
    "background": {
        "scripts": ["background.js"]
    },
    "minimum_chrome_version":"22.0.0"
}
"""

background_js = """
var config = {
        mode: "fixed_servers",
        rules: {
        singleProxy: {
            scheme: "http",
            host: "%s",
            port: parseInt(%s)
        },
        bypassList: ["localhost"]
        }
    };

chrome.proxy.settings.set({value: config, scope: "regular"}, function() {});

function callbackFn(details) {
    return {
        authCredentials: {
            username: "%s",
            password: "%s"
        }
    };
}

chrome.webRequest.onAuthRequired.addListener(
            callbackFn,
            {urls: ["<all_urls>"]},
            ['blocking']
);
""" % (PROXY_HOST, PROXY_PORT, PROXY_USER, PROXY_PASS)

def get_chromedriver(use_proxy=False, user_agent=None):
    path = os.path.dirname(os.path.abspath(__file__))
    chrome_options = webdriver.ChromeOptions()
    if use_proxy:
        pluginfile = 'proxy_auth_plugin.zip'

        with zipfile.ZipFile(pluginfile, 'w') as zp:
            zp.writestr("manifest.json", manifest_json)
            zp.writestr("background.js", background_js)
        chrome_options.add_extension(pluginfile)
    if user_agent:
        chrome_options.add_argument('--user-agent=%s' % user_agent)
    driver = webdriver.Chrome(
        os.path.join(path, 'chromedriver'),
        chrome_options=chrome_options)
    return driver

def main():
    driver = get_chromedriver(use_proxy=True)
    #driver.get('https://www.google.com/search?q=my+ip+address')
    driver.get('https://httpbin.org/ip')

if __name__ == '__main__':
    main()

函数get_chromedriver返回配置好的selenium webdriver,你可以在你的应用程序中使用。这段代码已经过测试,运行良好。
了解更多关于Chrome中onAuthRequired事件的信息。

093gszye

093gszye2#

使用 selenium 线。
文档中的示例代码:

HTTP代理

from seleniumwire import webdriver

options = {
    'proxy': {
        'http': 'http://user:pass@192.168.10.100:8888',
        'https': 'https://user:pass@192.168.10.100:8888',
        'no_proxy': 'localhost,127.0.0.1'
    }
}

driver = webdriver.Chrome(seleniumwire_options=options)

SOCKS代理

from seleniumwire import webdriver

options = {
   'proxy': {
        'http': 'socks5://user:pass@192.168.10.100:8888',
        'https': 'socks5://user:pass@192.168.10.100:8888',
        'no_proxy': 'localhost,127.0.0.1'
    }
}
driver = webdriver.Chrome(seleniumwire_options=options)

安装时:

pip install selenium-wire
mlmc2os5

mlmc2os53#

这里有一个快速而有创造性的解决方案,它不需要修改selenium的选项或上传文件到chromedriver,它使用pyautogui(可以使用任何Python包来模拟按键)来输入代理认证细节,它还使用线程来处理chrome认证弹出窗口,否则会暂停脚本。

import time
from threading import Thread
import pyautogui
from selenium.webdriver.chrome.options import Options
from selenium import webdriver

hostname = "HOST_NAME"
port = "PORT"
proxy_username = "USERNAME"
proxy_password = "PASSWORD"

chrome_options = Options()
chrome_options.add_argument('--proxy-server={}'.format(hostname + ":" + port))
driver = webdriver.Chrome(options=chrome_options)

def enter_proxy_auth(proxy_username, proxy_password):
    time.sleep(1)
    pyautogui.typewrite(proxy_username)
    pyautogui.press('tab')
    pyautogui.typewrite(proxy_password)
    pyautogui.press('enter')

def open_a_page(driver, url):
    driver.get(url)

Thread(target=open_a_page, args=(driver, "http://www.example.com/")).start()
Thread(target=enter_proxy_auth, args=(proxy_username, proxy_password)).start()

注意:对于任何严肃的项目或测试套件,我会建议选择一个更健壮的解决方案。然而,如果你只是在试验,需要一个快速有效的解决方案,这是一个选择。

bvpmtnay

bvpmtnay4#

我也在寻找同样的答案,但只针对Java代码,所以下面是我的@itsmnthn Python代码变体。

不要忘记将MainTest类的String字段更改为您的IP、端口、登录名、密码和chromeDriver路径。

import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeOptions;

import java.io.*;
import java.util.zip.ZipEntry;
import java.util.zip.ZipOutputStream;

public class MainTest {
    private static final String PROXY_HOST = "127.0.0.1";
    private static final String PROXY_PORT = "8080";
    private static final String PROXY_USER = "login";
    private static final String PROXY_PASS = "password";
    private static final String CHROMEDRIVER_PATH = "chromeDriverPath";
    private static final String PROXY_OPTION_TEMPLATE = "--proxy-server=http://%s";

    public static void main(String[] args) throws IOException {
        System.setProperty("webdriver.chrome.driver", CHROMEDRIVER_PATH);
        ChromeOptions options = new ChromeOptions();
        String manifest_json = "{\n" +
                "  \"version\": \"1.0.0\",\n" +
                "  \"manifest_version\": 2,\n" +
                "  \"name\": \"Chrome Proxy\",\n" +
                "  \"permissions\": [\n" +
                "    \"proxy\",\n" +
                "    \"tabs\",\n" +
                "    \"unlimitedStorage\",\n" +
                "    \"storage\",\n" +
                "    \"<all_urls>\",\n" +
                "    \"webRequest\",\n" +
                "    \"webRequestBlocking\"\n" +
                "  ],\n" +
                "  \"background\": {\n" +
                "    \"scripts\": [\"background.js\"]\n" +
                "  },\n" +
                "  \"minimum_chrome_version\":\"22.0.0\"\n" +
                "}";

        String background_js = String.format("var config = {\n" +
                "  mode: \"fixed_servers\",\n" +
                "  rules: {\n" +
                "    singleProxy: {\n" +
                "      scheme: \"http\",\n" +
                "      host: \"%s\",\n" +
                "      port: parseInt(%s)\n" +
                "    },\n" +
                "    bypassList: [\"localhost\"]\n" +
                "  }\n" +
                "};\n" +
                "\n" +
                "chrome.proxy.settings.set({value: config, scope: \"regular\"}, function() {});\n" +
                "\n" +
                "function callbackFn(details) {\n" +
                "return {\n" +
                "authCredentials: {\n" +
                "username: \"%s\",\n" +
                "password: \"%s\"\n" +
                "}\n" +
                "};\n" +
                "}\n" +
                "\n" +
                "chrome.webRequest.onAuthRequired.addListener(\n" +
                "callbackFn,\n" +
                "{urls: [\"<all_urls>\"]},\n" +
                "['blocking']\n" +
                ");", PROXY_HOST, PROXY_PORT, PROXY_USER, PROXY_PASS);

        FileOutputStream fos = new FileOutputStream("proxy_auth_plugin.zip");
        ZipOutputStream zipOS = new ZipOutputStream(fos);

        createFile("manifest.json", manifest_json);
        createFile("background.js", background_js);

        File file = new File("proxy_auth_plugin.zip");
        writeToZipFile("manifest.json", zipOS);
        writeToZipFile("background.js", zipOS);
        zipOS.close();
        fos.close();
        options.addExtensions(file);

        WebDriver driver = new ChromeDriver(options);
        try {
            driver.get("https://2ip.ru");
        } finally {
            driver.close();
        }

    }

    public static void writeToZipFile(String path, ZipOutputStream zipStream) throws FileNotFoundException, IOException {
        System.out.println("Writing file : '" + path + "' to zip file");
        File aFile = new File(path);
        FileInputStream fis = new FileInputStream(aFile);
        ZipEntry zipEntry = new ZipEntry(path);
        zipStream.putNextEntry(zipEntry);
        byte[] bytes = new byte[1024];
        int length;
        while ((length = fis.read(bytes)) >= 0) {
            zipStream.write(bytes, 0, length);
        }
        zipStream.closeEntry();
        fis.close();
    }

    public static void createFile(String filename, String text) throws FileNotFoundException {
        try (PrintWriter out = new PrintWriter(filename)) {
            out.println(text);
        }
    }

}
k10s72fa

k10s72fa5#

我也遇到了同样的问题--难道不可能把selenium-wire功能和选项中的headless功能结合起来吗--对我来说,这个代码是有效的--这有什么问题吗?

from seleniumwire import webdriver
from selenium.webdriver.chrome.options import Options
import os, sys, time
from dotenv import load_dotenv, find_dotenv

path = os.path.abspath (os.path.dirname (sys.argv[0]))
cd = '/chromedriver.exe'
load_dotenv(find_dotenv()) 
PROXY_CHEAP_USER = os.environ.get("PROXY_CHEAP_USER")
PROXY_CHEAP_PW= os.environ.get("PROXY_CHEAP_PW")
PROXY_HOST = 'proxyhost.com'  # rotating proxy or host
PROXY_PORT = port # port
PROXY_USER = PROXY_CHEAP_USER # username
PROXY_PASS = PROXY_CHEAP_PW # password

options = Options()
options.add_argument('--headless')
options.add_argument("--window-size=1920x1080")
options.add_argument('--no-sandbox')
options.add_argument('--disable-gpu')

options_seleniumWire = {
    'proxy': {
        'https': f'https://{PROXY_USER}:{PROXY_PASS}@{PROXY_HOST}:{PROXY_PORT}',
    }
}
 
driver = webdriver.Chrome (path + cd, options=options, seleniumwire_options=options_seleniumWire)
driver.get("https://ifconfig.co/")

我认为这个解决方案在无头模式下也能工作。

bttbmeg0

bttbmeg06#

因为似乎不可能直接配置Chromedriver使用需要认证的代理,所以你可以使用一个不需要任何认证的本地下游代理,然后这个本地代理将所有请求发送到你最初想使用的“真实的的”代理,并提供所需的认证。
我已经使用tinyproxy来完成这个任务,你可以在tinyproxy-configuration(tinyproxy.conf)中添加下面的代码行:

upstream http user:pass@host:port

请确保使用要使用的代理的值替换user、pass、host和port。
然后你可以配置你的Chromedriver来使用tinyproxy,就像前面的答案中所描述的那样,Tinyprox默认运行在8888端口,所以你可以通过127.0.0.1:8888访问它。正如this answer中所提到的,不需要身份验证就可以很容易地使用代理:

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--proxy-server=127.0.0.1:8888')
driver = webdriver.Chrome(chrome_options=chrome_options)
dvtswwa3

dvtswwa37#

一路上,在更新中,使用扩展的解决方案不起作用(至少在windows中),而mac和linux可以。我认为它是chromedriver v2.44,最后一个带有扩展的工作版本

t2a7ltrp

t2a7ltrp8#

对于这个问题有 * 几个解决方法 *,但是目前无法解决Selenium中的身份验证对话框。请参见this issue
目前还没有办法在导航到页面时处理HTTP身份验证提示,只有在URL中使用用户名/密码进行预身份验证才有效(显然,在IE等一些浏览器中并非没有变通办法)。

hmmo2u0o

hmmo2u0o9#

以下是最新版本Chrome和@itsmnthn解决方案附带的清单版本3

{
    "version": "1.0.0",
    "manifest_version": 3,
    "name": "Chrome Proxy",
    "permissions": [
        "proxy",
        "tabs",
        "unlimitedStorage",
        "storage",
        "webRequest",
        "webRequestAuthProvider"
        ],
    "host_permissions": [
        "<all_urls>"
    ],
    "background": {
        "service_worker": "background.js"
    },
    "minimum_chrome_version":"22.0.0"
}

和处理此问题的C#类(同样,基于@itsmnthn解决方案)

public class ProxiedChromeClient : IDisposable
{
    private const string MANIFEST_JSON = @"
    {
        ""version"": ""1.0.0"",
        ""manifest_version"": 3,
        ""name"": ""Chrome Proxy"",
        ""permissions"": [
            ""proxy"",
            ""tabs"",
            ""unlimitedStorage"",
            ""storage"",
            ""webRequest"",
            ""webRequestAuthProvider""
            ],
        ""host_permissions"": [
            ""<all_urls>""
        ],
        ""background"": {
            ""service_worker"": ""background.js""
        },
        ""minimum_chrome_version"":""22.0.0""
    }";

    private const string BACKGROUND_JS = @"
    var config = {{
        mode: ""fixed_servers"",
        rules: {{
            singleProxy: {{
                scheme: ""{0}"",
                host: ""{1}"",
                port: parseInt({2})
            }},
            bypassList: [""localhost""]
        }}
    }};

    chrome.proxy.settings.set({{value: config, scope: ""regular""}}, function() {{}});

    function callbackFn(details) {{
        return {{
            authCredentials: {{
                username: ""{3}"",
                password: ""{4}""
            }}
        }};
    }}

    chrome.webRequest.onAuthRequired.addListener(
        callbackFn,
        {{urls: [""<all_urls>""]}},
        ['blocking']
    );";

    protected ProxiedChromeClient(ProxyInfo proxy = null)
    {
        var options = new ChromeOptions();
        if (proxy != null)
        {
            extensionPath = CreateProxyExtension(proxy);
            options.AddExtension(extensionPath);
        }

        chromeDriverInstance = new ChromeDriver(options);
    }

    protected readonly ChromeDriver chromeDriverInstance;
    private readonly object @lock = new();
    private readonly string extensionPath;

    private static string CreateProxyExtension(ProxyInfo proxy)
    {
        // per https://stackoverflow.com/a/55582859/307584
        var tempFile = Path.GetTempFileName();
        using var z = new ZipArchive(new FileStream(tempFile, FileMode.Create), ZipArchiveMode.Create);
        var entry = z.CreateEntry("manifest.json");
        using (var writer = new StreamWriter(entry.Open()))
        {
            writer.Write(MANIFEST_JSON);
        }

        entry = z.CreateEntry("background.js");
        var url = new Uri(proxy.Url);
        using (var writer = new StreamWriter(entry.Open()))
        {
            writer.Write(BACKGROUND_JS, url.Scheme, url.Host, url.Port, proxy.User, proxy.Password);
        }

        return tempFile;
    }

    public void Dispose()
    {
        lock (@lock)
        {
            chromeDriverInstance.Quit();
            if (extensionPath != null)
            {
                File.Delete(extensionPath);
            }
        }
    }
}

相关问题