gloud dataproc

kmbjn2e3  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(335)

当我尝试在googlecloud(dataproc)上运行hadoop上的nutch时,我得到以下错误。你知道我为什么要面对这个问题吗

user@cluster-1-m:~/apache-nutch-1.7/build$ hadoop jar /home/user/apache-nutch-1.7/runtime/deploy/apache-nutch-1.7.job org.apache.nutch.crawl.Crawl /tmp/testnutch/input/urls.txt -solr http://SOLRIP:8080/solr/ -depth 5 -topN2

16/09/11 17:57:38 info crawl.crawl:爬网开始于:crawl-20160911175737 16/09/11 17:57:38 info crawl.crawl:rooturldir=-topn2 16/09/11 17:57:38 info crawl.crawl:threads=10 16/09/11 17:57:38 info crawl.crawl:depth=5 16/09/11 17:57:38 info crawl.crawl:solrurl=http://solrip:8080/solr/16/09/11 17:57:38警告配置:无法make crawl/20160911175738 in local directories from mapredu ce.cluster.local.dir 16/09/11 17:57:38 warn conf.configuration:mapreduce.cluster.local.dir[0]=/hadoop/mapred/local exception in thread“main”java.io.ioexception:属性mapreduce.cluster.local中没有有效的本地目录。目录位于org.apache.hadoop.conf.configuration.getlocalpath(配置。java:2302)在org.apache.hadoop.mapred.jobconf.getlocalpath(jobconf。java:569)在org.apache.nutch.crawl.crawl.run(crawl。java:123)在org.apache.hadoop.util.toolrunner.run(toolrunner。java:70)在org.apache.nutch.crawl.crawl.main(crawl。java:55)在sun.reflect.nativemethodaccessorimpl.invoke0(本机方法)位于sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl)。java:62)在sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl。java:43)在java.lang.reflect.method.invoke(方法。java:498)在org.apache.hadoop.util.runjar.run(runjar。java:221)在org.apache.hadoop.util.runjar.main(runjar。java:136)

c9qzyr3d

c9qzyr3d1#

您得到此异常是因为您正在以用户身份运行作业 user 谁不在办公室 hadoop 组,因此驱动程序无法访问本地目录。请尝试以下操作:

sudo sudo -u mapred hadoop jar \
    /home/user/apache-nutch-1.7/runtime/deploy/apache-nutch-1.7.job \
    org.apache.nutch.crawl.Crawl /tmp/testnutch/input/urls.txt \
    -solr http://SOLRIP:8080/solr/ -depth 5 -topN2

或者,如果您希望通过dataproc jobs api提交而不通过ssh'ing进入集群,那么dataproc也将以足够的权限运行:

gcloud dataproc jobs submit hadoop --cluster cluster-1 \
    --jar apache-nutch-1.7.jar \
    org.apache.nutch.crawl.Crawl /tmp/testnutch/input/urls.txt \
    -solr http://SOLRIP:8080/solr/ -depth 5 -topN2

相关问题