我正试图让nutch和hbase基于以下docker图像工作:https://hub.docker.com/r/cogfor/nutch/
我遇到一个异常,尝试插入url文件:
InjectorJob: starting at 2017-12-19 20:49:45
InjectorJob: Injecting urlDir: urls
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/hadoop/hbase/HBaseConfiguration
org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:114)
at g.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:78)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:218)
at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:252)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:275)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:284)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
我知道nutch/hbase/hadoop之间存在一些配置错误。
我的gora.properties有:
gora.datastore.default=org.apache.gora.hbase.store.HBaseStore
我的hbase-site.xml有:
<configuration>
<property>
<name>hbase.rootdir</name>
<value>file:///data</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>false</value>
</property>
</configuration>
我的nutch-site.xml有:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>http.agent.name</name>
<value>My Spider</value>
</property>
<property>
<name>storage.data.store.class</name>
<value>org.apache.gora.hbase.store.HBaseStore</value>
<description>Default class for storing data</description>
</property>
<property>
<name>plugin.includes</name>
<value>protocol-httpclient|urlfilter-regex|parse-(text|tika|js)|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
</property>
<property>
<name>db.ignore.external.links</name>
<value>true</value>
</property>
<property>
<name>parser.character.encoding.default</name>
<value>utf-8</value>
</property>
<property>
<name>http.content.limit</name>
<value>6553600</value>
</property>
同样的错误在s.o.上报告过多次,但没有一个解决方案对我有效。$hbase\u home和$hadoop\u classpath env变量设置为:
root@a5fb7fefc53e:/nutch_source/runtime/local/bin# echo $HADOOP_CLASSPATH
/opt/hbase-0.98.21-hadoop2/lib/hbase-client-0.98.21-hadoop2.jar:
/opt/hbase-0.98.21-hadoop2/lib/hbase-common-0.98.12-hadoop2.jar:
/opt/hbase-0.98.21-hadoop2/lib/protobuf-java-2.5.0.jar: /opt/hbase-
0.98.21-hadoop2/lib/guava-12.0.1.jar: /opt/hbase-0.98.21-
hadoop2/lib/zookeeper-3.4.6.jar: /opt/hbase-0.98.21-hadoop2/lib/hbase-
protocol-0.98.12-hadoop2.jar
root@a5fb7fefc53e:/nutch_source/runtime/local/bin# echo $HBASE_HOME
/opt/hbase-0.98.21-hadoop2
我确认了所有的文件都存在。有人能帮我解决我缺少的东西吗?
1条答案
按热度按时间yk9xbfzb1#
文件中提到了这个问题(https://wiki.apache.org/nutch/nutch2tutorial)
“注意,可能会遇到以下异常:java.lang.noclassdeffounderror:org/apache/hadoop/hbase/hbaseconfiguration;这是因为有时hbase测试jar部署在lib dir中。要解决这个问题,只需将lib从已安装的hbase目录复制到build lib目录(此问题目前正在进行中。”
需要做的就是:
nutch会开始工作的。