我将nutch与hbase和solr集成。
在启动hadoop和hbase服务之后,我在nutchhome中运行以下命令 sudo -E bin/crawl urls/seed.txt TestCrawl http://localhost:8983/solr/ 2
我面临这些错误:
Injecting seed URLs
/usr/local/apache-nutch-2.3.1/runtime/local/bin/nutch inject urls/seed.txt -crawlId TestCrawl
InjectorJob: starting at 2016-05-26 15:41:14
InjectorJob: Injecting urlDir: urls/seed.txt
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:114)
at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:78)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:218)
at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:252)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:275)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:284)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 10 more
Error running:
/usr/local/apache-nutch-2.3.1/runtime/local/bin/nutch inject urls/seed.txt -crawlId TestCrawl
Failed with exit value 1.
有人能告诉我这有什么问题吗?
1条答案
按热度按时间f87krz0w1#
这是一个错误,在执行爬网脚本时无法找到可传递的依赖项。
更好的配置是nutch-2.3.1和hbase-0.98.8-hadoop2
为了更好的理解,请参考下面的网址
https://wiki.apache.org/nutch/nutch2tutorial
这是gora hbase 0.6.1中的一个bug
另外,添加缺少的hbase-common-0.98.8-hadoop2.jar可传递依赖项,这是gora hbase 0.6.1中的一个bug
有了这个我就可以成功地爬了。