Nutch 1.8和Apache Solr 4.8集成作业失败

cmssoen2  于 2022-11-05  发布在  Solr
关注(0)|答案(1)|浏览(194)

我正在尝试在Windows 7上使用Nutch 1.8和Solr 4.8抓取网络。

bin/crawl urls newsolr http://localhost:8983/solr/ 1 -depth 1

我不断收到以下错误

Indexer: java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
    at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)
    at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)

以下是日志文件的一部分:

2014-07-01 16:58:33,613 INFO  solr.SolrMappingReader - source: content dest: content
2014-07-01 16:58:33,613 INFO  solr.SolrMappingReader - source: title dest: title
2014-07-01 16:58:33,613 INFO  solr.SolrMappingReader - source: host dest: host
2014-07-01 16:58:33,613 INFO  solr.SolrMappingReader - source: segment dest: segment
2014-07-01 16:58:33,613 INFO  solr.SolrMappingReader - source: boost dest: boost
2014-07-01 16:58:33,613 INFO  solr.SolrMappingReader - source: digest dest: digest
2014-07-01 16:58:33,613 INFO  solr.SolrMappingReader - source: tstamp dest: tstamp
2014-07-01 16:58:33,613 INFO  solr.SolrMappingReader - source: url dest: id
2014-07-01 16:58:33,613 INFO  solr.SolrMappingReader - source: url dest: url
2014-07-01 16:58:33,643 INFO  solr.SolrIndexWriter - Indexing 1 documents
2014-07-01 16:58:33,773 WARN  mapred.LocalJobRunner - job_local_0001
org.apache.solr.common.SolrException: Method Not Allowed

Method Not Allowed

request: http://localhost:8983/solr/
    at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
    at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
    at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
    at org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:155)
    at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:118)
    at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
2014-07-01 16:58:34,628 ERROR indexer.IndexingJob - Indexer: java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
    at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)
    at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)

最后,Solr的错误日志:

org.apache.solr.common.SolrException: ERROR: [doc=http://.com/] unknown field 'tstamp' `

这是我的第一个solr/nutch设置。

8ftvxx2r

8ftvxx2r1#

只需停止solr示例并重新启动它。这应该可以解决您的问题。出现错误的原因是您对模式文件进行了更改,但没有重新启动solr以保存更改,因此solr无法“看到”新添加的字段。

相关问题