selenium 已到达错误页面:位于x的服务器响应时间过长

uqcuzwp8  于 2022-11-24  发布在  其他
关注(0)|答案(3)|浏览(190)

我想在Heroku上部署我的应用程序。我的应用程序抓取公寓网站的数据。对于一个URL,我有多个选择器。应用程序使用APSceduler运行。日志显示以下错误:

2020-08-10T11:02:56.259319+00:00 app[clock.1]: Running main
2020-08-10T11:04:34.374167+00:00 app[clock.1]: Job "main (trigger: interval[3:00:00], next run at: 2020-08-10 14:02:56 UTC)" raised an exception
2020-08-10T11:04:34.374183+00:00 app[clock.1]: Traceback (most recent call last):
2020-08-10T11:04:34.374184+00:00 app[clock.1]: File "/app/.heroku/python/lib/python3.8/site-packages/apscheduler/executors/base.py", line 125, in run_job
2020-08-10T11:04:34.374184+00:00 app[clock.1]: retval = job.func(*job.args, **job.kwargs)
2020-08-10T11:04:34.374185+00:00 app[clock.1]: File "/app/scraper/common.py", line 70, in main
2020-08-10T11:04:34.374186+00:00 app[clock.1]: driver.get(listing.url)
2020-08-10T11:04:34.374187+00:00 app[clock.1]: File "/app/.heroku/python/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 333, in get
2020-08-10T11:04:34.374188+00:00 app[clock.1]: self.execute(Command.GET, {'url': url})
2020-08-10T11:04:34.374188+00:00 app[clock.1]: File "/app/.heroku/python/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
2020-08-10T11:04:34.374189+00:00 app[clock.1]: self.error_handler.check_response(response)
2020-08-10T11:04:34.374189+00:00 app[clock.1]: File "/app/.heroku/python/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
2020-08-10T11:04:34.374190+00:00 app[clock.1]: raise exception_class(message, screen, stacktrace)
2020-08-10T11:04:34.374191+00:00 app[clock.1]: selenium.common.exceptions.WebDriverException: Message: Reached error page: about:neterror?e=netTimeout&u=x&d=The%20server%20at%20x%20is%20taking%20too%20long%20to%20respond.

已解码:
about:netror?e=netTimeout&u=&d=位于x的服务器响应时间过长。
如果我转到链接,我就可以访问它。我已经禁用了JavaScript和图像,以便更快地加载链接。
我不确定这里有什么问题。

tpgth1q7

tpgth1q71#

结果发现,目标网站屏蔽了Heroku。解决方法是使用代理

yzuktlbb

yzuktlbb2#

我认为您希望等到您正在寻找的元素等待:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException

try:
   my_element = WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.ID, 'ID_of_element')))
   print "Page is ready"
except TimeoutException:
   print "Loading took to much time"
b4wnujal

b4wnujal3#

遇到同样的问题,也许其他人也可以防止同样的错误。如果网站使用http,但你输入https,它也会有这个确切的错误
示例:

正确的网站http://some-website.com

driver.get('http://some-website.com')
错误

相关问题