NutchSolr数据导入处理程序?

6tdlim6h  于 2021-06-02  发布在  Hadoop
关注(0)|答案(0)|浏览(141)

我在hadoop上安装了一个nutch爬虫程序。下面是软件栈及其各自的版本。apache-nutch-2.3.1、hbase-0.98.8-hadoop2都在hadoop-2.5.2之上。在数据插入hbase之前的所有过程都正常。问题是,当我尝试使用类org.apache.nutch.indexer.indexingjob调用indexingjob时,命令运行成功,但solr中没有记录被索引。solr版本是solr-5.3.1。
下面是我运行的命令的输出:

15/12/15 18:26:32 INFO mapreduce.Job: Running job: job_1450175405767_0007
15/12/15 18:26:43 INFO mapreduce.Job: Job job_1450175405767_0007 running in uber mode : false
15/12/15 18:26:43 INFO mapreduce.Job:  map 0% reduce 0%
15/12/15 18:28:00 INFO mapreduce.Job:  map 50% reduce 0%
15/12/15 18:28:22 INFO mapreduce.Job:  map 100% reduce 0%
15/12/15 18:28:22 INFO mapreduce.Job: Job job_1450175405767_0007 completed successfully
15/12/15 18:28:23 INFO mapreduce.Job: Counters: 31
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=230132
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=1324
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=2
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=0
    Job Counters 
        Killed map tasks=1
        Launched map tasks=3
        Data-local map tasks=3
        Total time spent by all maps in occupied slots (ms)=192484
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=192484
        Total vcore-seconds taken by all map tasks=192484
        Total megabyte-seconds taken by all map tasks=197103616
    Map-Reduce Framework
        Map input records=3312819
        Map output records=0
        Input split bytes=1324
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=1678
        CPU time spent (ms)=62560
        Physical memory (bytes) snapshot=406765568
        Virtual memory (bytes) snapshot=3877060608
        Total committed heap usage (bytes)=239075328
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=0
15/12/15 18:28:23 INFO indexer.IndexWriters: Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter
15/12/15 18:28:23 INFO indexer.IndexingJob: Active IndexWriters :
SOLRIndexWriter
    solr.server.url : URL of the SOLR instance (mandatory)
    solr.commit.size : buffer size when sending to SOLR (default 1000)
    solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
    solr.auth : use authentication (default false)
    solr.auth.username : username for authentication
    solr.auth.password : password for authentication

15/12/15 18:28:23 INFO conf.Configuration: found resource solrindex-mapping.xml at file:/tmp/hadoop-root/hadoop-unjar491190780945254030/solrindex-mapping.xml
15/12/15 18:28:23 INFO solr.SolrMappingReader: source: content dest: content
15/12/15 18:28:23 INFO solr.SolrMappingReader: source: title dest: title
15/12/15 18:28:23 INFO solr.SolrMappingReader: source: host dest: host
15/12/15 18:28:23 INFO solr.SolrMappingReader: source: batchId dest: batchId
15/12/15 18:28:23 INFO solr.SolrMappingReader: source: boost dest: boost
15/12/15 18:28:23 INFO solr.SolrMappingReader: source: digest dest: digest
15/12/15 18:28:23 INFO solr.SolrMappingReader: source: tstamp dest: tstamp
15/12/15 18:28:23 INFO solr.SolrIndexWriter: Total 0 document is added.
15/12/15 18:28:23 INFO indexer.IndexingJob: IndexingJob: done.

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题