nutch索引失败,返回java.lang.nosuchfielderror:instance

byqmnocz  于 2021-06-01  发布在  Hadoop
关注(0)|答案(0)|浏览(269)

我正在使用nutch1.13来抓取数据并将它们存储到elasticsearch。我还创建了一些自定义的解析过滤器和索引过滤器插件。一切正常。
我把elasticsearch更新到了版本5。那么, indexer-elastic 插件因版本不匹配而停止工作。另外,从一些文档中我知道ElasticSearchVersion5只支持Nutch2+版本。
但是,我坚持使用这个nutch版本,并从这里找到了一个用于索引elasticsearch而不是rest的插件。在nutch中进行了更改以包含此插件。
尝试了爬行和索引,结果成功了 local mode 坚果。当我尝试同样的方法时 deployed mode ,在索引阶段出现以下异常:

17/11/16 10:53:37 INFO mapreduce.Job: Running job: job_1510809462003_0010
17/11/16 10:53:44 INFO mapreduce.Job: Job job_1510809462003_0010 running in uber mode : false
17/11/16 10:53:44 INFO mapreduce.Job:  map 0% reduce 0%
17/11/16 10:53:48 INFO mapreduce.Job:  map 20% reduce 0%
17/11/16 10:53:52 INFO mapreduce.Job:  map 40% reduce 0%
17/11/16 10:53:56 INFO mapreduce.Job:  map 60% reduce 0%
17/11/16 10:53:59 INFO mapreduce.Job:  map 80% reduce 20%
17/11/16 10:54:02 INFO mapreduce.Job:  map 100% reduce 100%
17/11/16 10:54:02 INFO mapreduce.Job: Task Id : attempt_1510809462003_0010_r_000000_0, Status : FAILED
Error: INSTANCE
17/11/16 10:54:03 INFO mapreduce.Job:  map 100% reduce 0%
17/11/16 10:54:06 INFO mapreduce.Job: Task Id : attempt_1510809462003_0010_r_000000_1, Status : FAILED
Error: INSTANCE
17/11/16 10:54:10 INFO mapreduce.Job: Task Id : attempt_1510809462003_0010_r_000000_2, Status : FAILED
Error: INSTANCE
17/11/16 10:54:15 INFO mapreduce.Job:  map 100% reduce 100%
17/11/16 10:54:15 INFO mapreduce.Job: Job job_1510809462003_0010 failed with state FAILED due to: Task failed task_1510809462003_0010_r_000000
Job failed as tasks failed. failedMaps:0 failedReduces:1

17/11/16 10:54:15 INFO mapreduce.Job: Counters: 38
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=804602
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=44204
HDFS: Number of bytes written=0
HDFS: Number of read operations=20
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters 
Failed reduce tasks=4
Killed map tasks=1
Launched map tasks=5
Launched reduce tasks=4
Data-local map tasks=5
Total time spent by all maps in occupied slots (ms)=39484
Total time spent by all reduces in occupied slots (ms)=16866
Total time spent by all map tasks (ms)=9871
Total time spent by all reduce tasks (ms)=16866
Total vcore-milliseconds taken by all map tasks=9871
Total time spent by all reduce tasks (ms)=16866
Total vcore-milliseconds taken by all map tasks=9871
Total vcore-milliseconds taken by all reduce tasks=16866
Total megabyte-milliseconds taken by all map tasks=40431616
Total megabyte-milliseconds taken by all reduce tasks=17270784
Map-Reduce Framework
Map input records=436
Map output records=436
Map output bytes=55396
Map output materialized bytes=56302
Input split bytes=698
Combine input records=0
Spilled Records=436
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=246
CPU time spent (ms)=3840
Physical memory (bytes) snapshot=1559916544
Virtual memory (bytes) snapshot=25255698432
Total committed heap usage (bytes)=1503657984
File Input Format Counters 
Bytes Read=43506
17/11/16 10:54:15 ERROR impl.JobWorker: Cannot run job worker!
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:865)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:94)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:87)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:352)
at org.apache.nutch.service.impl.JobWorker.run(JobWorker.java:71)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

hadoop日志是:

2017-11-16 10:54:13,731 INFO [main] org.apache.nutch.indexer.IndexWriters: Adding org.apache.nutch.indexwriter.elasticrest.ElasticRestIndexWriter
2017-11-16 10:54:13,801 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoSuchFieldError: INSTANCE
    at org.apache.http.conn.ssl.SSLConnectionSocketFactory.<clinit>(SSLConnectionSocketFactory.java:144)
    at org.apache.nutch.indexwriter.elasticrest.ElasticRestIndexWriter.open(ElasticRestIndexWriter.java:133)
    at org.apache.nutch.indexer.IndexWriters.open(IndexWriters.java:75)
    at org.apache.nutch.indexer.IndexerOutputFormat.getRecordWriter(IndexerOutputFormat.java:39)
    at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.<init>(ReduceTask.java:484)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:414)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

在搜索了这个之后,我发现这是由于httpjar的一些版本问题造成的。我使用的hadoop版本是 2.7.2 . 我也尝试过hadoop版本 2.8.2 结果是一样的。
寻找解决方案。
已解决:问题是hadoop中较旧的jar版本的httpcore 2.7.2 . 把那些jar搬走解决了问题。

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题