Chrome 502 Bad Gateway Cloudflare错误时使用Selenium进行Web抓取

我目前正在使用Python中的selenium进行网络抓取在线数据库。数据库的格式需要在页面之间导航，以便抓取我感兴趣的数据，每次我运行代码时，我总是会遇到502 Bad Gateway Error（下图）。

这个错误消息似乎会消失 * 有时 *，但它似乎取决于这个502在循环中弹出的位置。任何关于如何避免这种情况的建议将不胜感激。我还附上了下面与Chrome交互的部分代码以供参考：

# ! Final !
#### Define Driver & Starting URL ####
# Location of chromedriver
driver_path = "/Users/shrey/Desktop/Python Projects/Selenium/chromedriver"

# Beginning url & initialize driver
url = "https://tamu.libguides.com/az.php"
driver = webdriver.Chrome()

# Make driver wait for elements to load when find_element() is run for the rest of our code
driver.implicitly_wait(10)

# Launch driver
driver.get(url)

# Press "Ancestry Database" link
driver.find_element(By.LINK_TEXT,
                    "Ancestry Library").click()

# Give time for user to login to database
time.sleep(30)

# Go to link where we can search from
home = "https://www.ancestrylibrary.com/search/collections/1742/"
driver.get(home)

# Switch to first tab (Search tab we just opened)
driver.switch_to.window(driver.window_handles[0])

#### Loop through each year present in the data ####
for yr in range(1886, 1952):
    # Go to search home
    driver.get(home)
    
    # Find textbox & Input Year --------
    year_input = driver.find_element(By.CSS_SELECTOR, "#sfs_SelfCivilYear")
    year_input.send_keys(str(yr))

    # Press "search" button
    driver.find_element(By.CSS_SELECTOR, "#searchButton").click()

    # Determine number of times we need to loop --------
    # Find text which includes total number of results (formatted as "Results 1–20 of 1,351")
    n_raw = driver.find_element(By.XPATH,
                                '//*[@id="results-header"]/h3').text

    # Isolate the important number (1,351)
    n_num = (tot_results.split()[-1]) # pulls the last word from the string - our desired number

    # Remove comma and convert to number ("1,351" >>> 1351)
    n_total = int(re.sub(",", "", n_num))

    # Determine number of loops we need to do to scrape all the data
    loop_count = math.floor(n_total/20) + 1

    # Loop thru pages and collect links --------
    # Init empty list
    links = []
    
    # Loop n times (calc'd earlier)
    for i in range(loop_count):
        
        # If we are on our last iter, do the same but do not click "next page" button
        if i == range(loop_count)[-1]: 
            # Find & Store all "View Result" links
            current_pg_links = driver.find_elements(By.CSS_SELECTOR, 
                                                    ".srchFoundDB a")

            # Loop through all links pulled & append
            for link in current_pg_links:
                # Get actual url from 'href' attribute
                url = link.get_attribute('href')

                # Append URL to final list
                links.append(url)

        else:
            # Find & Store all "View Result" links
            current_pg_links = driver.find_elements(By.CSS_SELECTOR, 
                                                    ".srchFoundDB a")

            for link in current_pg_links:
                # Get actual url from 'href' attribute
                url = link.get_attribute('href')

                # Append URL to final list
                links.append(url)

            # Press "next page" button
            driver.find_element(By.CSS_SELECTOR,
                                "a.ancBtn.sml.green.icon.iconArrowRight").click()

502 Bad Gateway Cloudflare错误

当Cloudflare无法与您网站的原始Web服务器建立有效连接时，会发生502 Bad Gateway Cloudflare错误。虽然此错误消息与服务器端（即您的Web主机），如果Cloudflare服务关闭或未正确配置，也可能发生这种情况。

详情

当您访问一个网站时，客户端会向Web服务器发送请求。Web服务器接收并处理请求，然后将所请求的资源沿着HTTP报头和HTTP状态代码一起发送回。通常情况下，除非出现错误，否则不会看到HTTP状态代码。但是当您在网站上使用Cloudflare时，请求会在到达客户端之前发送到Cloudflare。当Cloudflare无法与您网站的原始Web服务器建立有效连接时，会发生502 Bad Gateway Cloudflare错误。虽然此错误消息与服务器端有关，但如果Cloudflare服务关闭或未正确配置，也可能发生此错误。这是服务器通知您发生错误的方式沿着如何诊断它的代码。
举个例子：

基于您的Web服务器和浏览器，您可能会看到不同的502错误，但它们都意味着相同的事情：

502 Bad Gateway
- 错误502*
*502代理服务器 *
HTTP 502
502代理错误 *
错误（502）
HTTP错误502 - Bad Gateway
502 Bad Gateway Nginx
服务器错误：Web服务器遇到临时错误，无法完成您的请求 *
*502错误 *
*502服务暂时过载 *

一些网站还可以自定义502网关错误的外观。然而，所有变化具有相同的含义，即充当代理的服务器尚未从源服务器接收到有效响应。

原因

此502 Bad Gateway Cloudflare错误的两个可能原因是：

502来自源web服务器的状态代码
502来自Cloudflare的错误

解决方案

502 Bad Gateway Cloudflare错误是网络/服务器问题，但有时也可能是客户端问题。因此，客户端修复错误以恢复运行的一些常见步骤如下：

- 清除浏览器缓存并重新加载页面 *。
- 检查DNS服务器问题 *。
- 检查主机 *。
- 暂时禁用Cloudflare代理 *。
- 暂时关闭CDN或防火墙 *。
- 检查插件/主题冲突 *。

Chrome 502 Bad Gateway Cloudflare错误时使用Selenium进行Web抓取

1条答案

502 Bad Gateway Cloudflare错误

详情

原因

解决方案

tl; dr

相关问题

热门标签

最新问答