我目前正在使用Python中的selenium
进行网络抓取在线数据库。数据库的格式需要在页面之间导航,以便抓取我感兴趣的数据,每次我运行代码时,我总是会遇到502 Bad Gateway Error(下图)。
这个错误消息似乎会消失 * 有时 *,但它似乎取决于这个502在循环中弹出的位置。任何关于如何避免这种情况的建议将不胜感激。我还附上了下面与Chrome交互的部分代码以供参考:
# ! Final !
#### Define Driver & Starting URL ####
# Location of chromedriver
driver_path = "/Users/shrey/Desktop/Python Projects/Selenium/chromedriver"
# Beginning url & initialize driver
url = "https://tamu.libguides.com/az.php"
driver = webdriver.Chrome()
# Make driver wait for elements to load when find_element() is run for the rest of our code
driver.implicitly_wait(10)
# Launch driver
driver.get(url)
# Press "Ancestry Database" link
driver.find_element(By.LINK_TEXT,
"Ancestry Library").click()
# Give time for user to login to database
time.sleep(30)
# Go to link where we can search from
home = "https://www.ancestrylibrary.com/search/collections/1742/"
driver.get(home)
# Switch to first tab (Search tab we just opened)
driver.switch_to.window(driver.window_handles[0])
#### Loop through each year present in the data ####
for yr in range(1886, 1952):
# Go to search home
driver.get(home)
# Find textbox & Input Year --------
year_input = driver.find_element(By.CSS_SELECTOR, "#sfs_SelfCivilYear")
year_input.send_keys(str(yr))
# Press "search" button
driver.find_element(By.CSS_SELECTOR, "#searchButton").click()
# Determine number of times we need to loop --------
# Find text which includes total number of results (formatted as "Results 1–20 of 1,351")
n_raw = driver.find_element(By.XPATH,
'//*[@id="results-header"]/h3').text
# Isolate the important number (1,351)
n_num = (tot_results.split()[-1]) # pulls the last word from the string - our desired number
# Remove comma and convert to number ("1,351" >>> 1351)
n_total = int(re.sub(",", "", n_num))
# Determine number of loops we need to do to scrape all the data
loop_count = math.floor(n_total/20) + 1
# Loop thru pages and collect links --------
# Init empty list
links = []
# Loop n times (calc'd earlier)
for i in range(loop_count):
# If we are on our last iter, do the same but do not click "next page" button
if i == range(loop_count)[-1]:
# Find & Store all "View Result" links
current_pg_links = driver.find_elements(By.CSS_SELECTOR,
".srchFoundDB a")
# Loop through all links pulled & append
for link in current_pg_links:
# Get actual url from 'href' attribute
url = link.get_attribute('href')
# Append URL to final list
links.append(url)
else:
# Find & Store all "View Result" links
current_pg_links = driver.find_elements(By.CSS_SELECTOR,
".srchFoundDB a")
for link in current_pg_links:
# Get actual url from 'href' attribute
url = link.get_attribute('href')
# Append URL to final list
links.append(url)
# Press "next page" button
driver.find_element(By.CSS_SELECTOR,
"a.ancBtn.sml.green.icon.iconArrowRight").click()
1条答案
按热度按时间iaqfqrcu1#
502 Bad Gateway Cloudflare错误
当Cloudflare无法与您网站的原始Web服务器建立有效连接时,会发生502 Bad Gateway Cloudflare错误。虽然此错误消息与服务器端(即您的Web主机),如果Cloudflare服务关闭或未正确配置,也可能发生这种情况。
详情
当您访问一个网站时,客户端会向Web服务器发送请求。Web服务器接收并处理请求,然后将所请求的资源沿着HTTP报头和HTTP状态代码一起发送回。通常情况下,除非出现错误,否则不会看到HTTP状态代码。但是当您在网站上使用Cloudflare时,请求会在到达客户端之前发送到Cloudflare。当Cloudflare无法与您网站的原始Web服务器建立有效连接时,会发生502 Bad Gateway Cloudflare错误。虽然此错误消息与服务器端有关,但如果Cloudflare服务关闭或未正确配置,也可能发生此错误。这是服务器通知您发生错误的方式沿着如何诊断它的代码。
举个例子:
基于您的Web服务器和浏览器,您可能会看到不同的502错误,但它们都意味着相同的事情:
一些网站还可以自定义502网关错误的外观。然而,所有变化具有相同的含义,即充当代理的服务器尚未从源服务器接收到有效响应。
原因
此502 Bad Gateway Cloudflare错误的两个可能原因是:
解决方案
502 Bad Gateway Cloudflare错误是网络/服务器问题,但有时也可能是客户端问题。因此,客户端修复错误以恢复运行的一些常见步骤如下:
tl; dr