nutch crawl使用协议selenium和phantomjs作为mesos任务启动:org.openqa.selenium.nosuchelementexception

9bfwbjaz  于 2021-06-26  发布在  Mesos
关注(0)|答案(0)|浏览(226)

我正在尝试用nutch使用协议selenium和phantomjs驱动程序来抓取基于ajax的站点。我使用的是从nutch的github存储库编译的apache-nutch-1.13。这些爬网作为任务在mesos管理的系统中启动。当我从服务器的终端启动nutch的crawl脚本时,一切都很顺利,站点按照我的要求进行了爬网。但是,当我在mesos任务中使用相同的参数执行相同的爬网脚本时,nutch引发了一个异常:

fetch of http://XXXXX failed with: java.lang.RuntimeException: org.openqa.selenium.NoSuchElementException: {"errorMessage":"Unable to find element with tag name 'body'","request":{"headers":{"Accept-Encoding":"gzip,deflate","Connection":"Keep-Alive","Content-Length":"35","Content-Type":"application/json; charset=utf-8","Host":"localhost:12215","User-Agent":"Apache-HttpClient/4.3.5 (java 1.5)"},"httpVersion":"1.1","method":"POST","post":"{\"using\":\"tag name\",\"value\":\"body\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/a7f98ec0-b8aa-11e6-8b84-232b0d8e1024/element"}}

我的第一印象是环境变量(hadoop\u home,path,classpath…)有些奇怪,但我在nutch脚本和终端中使用了相同的变量,结果仍然相同。
你知道我做错了什么吗?

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题