使用mapreduce访问配置单元表数据

blmhpbnm  于 2021-06-04  发布在  Hadoop
关注(0)|答案(1)|浏览(320)

在hadoop 2.2的单节点安装中,我尝试运行cloudera示例“使用mapreduce访问表数据”,将数据从一个表复制到另一个表:
http://www.cloudera.com/content/cloudera-content/cloudera-docs/cdh4/latest/cdh4-installation-guide/cdh4ig_topic_19_6.html
示例代码编译时会出现许多弃用警告(见下文)。在从eclipse运行这个示例之前,我在hivedefaultdb中创建了输入表“simple”。我在命令行上传递输入“simple”和输出“simpid”表。尽管默认数据库中已经存在输入表,但当我运行此代码时,会出现异常:

java.io.IOException: NoSuchObjectException(message:default.simple table not found

问题:
1) 为什么会发生“未找到表”异常?如何解决这个问题?
2) 本例中不推荐使用的hcatrecord、hcatschema、hcatbaseinputformat如何转换为最新、稳定的api?

package com.bigdata;

import java.io.IOException;
import java.util.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.util.*;
import org.apache.hcatalog.mapreduce.*;
import org.apache.hcatalog.data.*;
import org.apache.hcatalog.data.schema.*;

public class UseHCat extends Configured implements Tool {

 public static class Map extends Mapper<WritableComparable, HCatRecord, Text, IntWritable> {
  String groupname;

    @Override
  protected void map( WritableComparable key,
                      HCatRecord value,
                      org.apache.hadoop.mapreduce.Mapper<WritableComparable, HCatRecord,
                      Text, IntWritable>.Context context)
        throws IOException, InterruptedException {
        // The group table from /etc/group has name, 'x', id
        groupname = (String) value.get(0);
        int id = (Integer) value.get(1);
        // Just select and emit the name and ID
        context.write(new Text(groupname), new IntWritable(id));
    }
}

public static class Reduce extends Reducer<Text, IntWritable,
                                   WritableComparable, HCatRecord> {

    protected void reduce( Text key,
                           java.lang.Iterable<IntWritable> values,
                           org.apache.hadoop.mapreduce.Reducer<Text, IntWritable,
                           WritableComparable, HCatRecord>.Context context)
        throws IOException, InterruptedException {
        // Only expecting one ID per group name
        Iterator<IntWritable> iter = values.iterator();
        IntWritable iw = iter.next();
        int id = iw.get();
        // Emit the group name and ID as a record
        HCatRecord record = new DefaultHCatRecord(2);
        record.set(0, key.toString());
        record.set(1, id);
        context.write(null, record);
    }
}

public int run(String[] args) throws Exception {
    Configuration conf = getConf();
    args = new GenericOptionsParser(conf, args).getRemainingArgs();

    // Get the input and output table names as arguments
    String inputTableName = args[0];
    String outputTableName = args[1];
    // Assume the default database
    String dbName = null;

    Job job = new Job(conf, "UseHCat");
    HCatInputFormat.setInput(job, InputJobInfo.create(dbName,
            inputTableName, null));
    job.setJarByClass(UseHCat.class);
    job.setMapperClass(Map.class);
    job.setReducerClass(Reduce.class);

    // An HCatalog record as input
    job.setInputFormatClass(HCatInputFormat.class);

    // Mapper emits a string as key and an integer as value
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(IntWritable.class);

    // Ignore the key for the reducer output; emitting an HCatalog record as value
    job.setOutputKeyClass(WritableComparable.class);
    job.setOutputValueClass(DefaultHCatRecord.class);
    job.setOutputFormatClass(HCatOutputFormat.class);

    HCatOutputFormat.setOutput(job, OutputJobInfo.create(dbName,
               outputTableName, null));
    HCatSchema s = HCatOutputFormat.getTableSchema(job);
    System.err.println("INFO: output schema explicitly set for writing:" + s);
    HCatOutputFormat.setSchema(job, s);
    return (job.waitForCompletion(true) ? 0 : 1);
}

public static void main(String[] args) throws Exception {
    int exitCode = ToolRunner.run(new UseHCat(), args);
    System.exit(exitCode);
 }
}

在单节点hadoop 2.2上运行此程序时,出现以下异常:

14/03/05 15:17:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/03/05 15:17:21 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/03/05 15:17:21 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
14/03/05 15:17:21 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
14/03/05 15:17:21 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
14/03/05 15:17:21 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
14/03/05 15:17:21 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/03/05 15:17:21 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
14/03/05 15:17:22 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
14/03/05 15:17:22 INFO metastore.ObjectStore: ObjectStore, initialize called
14/03/05 15:17:23 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
14/03/05 15:17:24 WARN bonecp.BoneCPConfig: Max Connections < 1. Setting to 20
14/03/05 15:17:25 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
14/03/05 15:17:25 INFO metastore.ObjectStore: Initialized ObjectStore
14/03/05 15:17:27 WARN bonecp.BoneCPConfig: Max Connections < 1. Setting to 20
14/03/05 15:17:27 INFO metastore.HiveMetaStore: 0: get_database: NonExistentDatabaseUsedForHealthCheck
14/03/05 15:17:27 INFO HiveMetaStore.audit: ugi=dk  ip=unknown-ip-addr  cmd=get_database: NonExistentDatabaseUsedForHealthCheck 
14/03/05 15:17:27 ERROR metastore.RetryingHMSHandler: NoSuchObjectException(message:There is no database named nonexistentdatabaseusedforhealthcheck)

at org.apache.hadoop.hive.metastore.ObjectStore.getMDatabase(ObjectStore.java:431)
at org.apache.hadoop.hive.metastore.ObjectStore.getDatabase(ObjectStore.java:441)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:124)
at com.sun.proxy.$Proxy6.getDatabase(Unknown Source)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:628)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103)
at com.sun.proxy.$Proxy7.get_database(Unknown Source)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:810)
at org.apache.hcatalog.common.HiveClientCache$CacheableHiveMetaStoreClient.isOpen(HiveClientCache.java:277)
at org.apache.hcatalog.common.HiveClientCache.get(HiveClientCache.java:147)
at org.apache.hcatalog.common.HCatUtil.getHiveClient(HCatUtil.java:547)
at org.apache.hcatalog.mapreduce.InitializeInput.getInputJobInfo(InitializeInput.java:104)
at org.apache.hcatalog.mapreduce.InitializeInput.setInput(InitializeInput.java:86)
at org.apache.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:87)
at org.apache.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:56)
at org.apache.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:48)
at com.bigdata.UseHCat.run(UseHCat.java:64)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at com.bigdata.UseHCat.main(UseHCat.java:91)

14/03/05 15:17:27 INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=simple
14/03/05 15:17:27 INFO HiveMetaStore.audit: ugi=dk  ip=unknown-ip-addr  cmd=get_table : db=default tbl=simple   
14/03/05 15:17:27 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
14/03/05 15:17:27 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.

     Exception in thread "main" java.io.IOException: NoSuchObjectException(message:default.simple table not found)

at org.apache.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:89)
at org.apache.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:56)
at org.apache.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:48)
at com.bigdata.UseHCat.run(UseHCat.java:64)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at com.bigdata.UseHCat.main(UseHCat.java:91)
    Caused by: NoSuchObjectException(message:default.simple table not found)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:1373)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103)
at com.sun.proxy.$Proxy7.get_table(Unknown Source)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:854)
at org.apache.hcatalog.common.HCatUtil.getTable(HCatUtil.java:193)
at org.apache.hcatalog.mapreduce.InitializeInput.getInputJobInfo(InitializeInput.java:105)
at org.apache.hcatalog.mapreduce.InitializeInput.setInput(InitializeInput.java:86)
at org.apache.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:87)
... 6 more
14/03/05 15:17:29 INFO metastore.HiveMetaStore: 1: Shutting down the object store...
14/03/05 15:17:29 INFO HiveMetaStore.audit: ugi=dk  ip=unknown-ip-addr  cmd=Shutting down the object store...   
14/03/05 15:17:29 INFO metastore.HiveMetaStore: 1: Metastore shutdown complete.
14/03/05 15:17:29 INFO HiveMetaStore.audit: ugi=dk  ip=unknown-ip-addr  cmd=Metastore shutdown complete.
kq4fsx7k

kq4fsx7k1#

看起来你没有 hive-site.xml 在eclipse类路径中。配置单元在配置中查找元存储服务器地址。当它找不到地址时,它会创建或加载一个嵌入式元存储。当然,这个元存储没有您创建的表。
编辑(回应您的评论):
是的,您可以通过键获取值,但是为了做到这一点,您需要 HCatSchema 为了table。在Map设置阶段执行此操作。。。

HCatSchema schema = HCatBaseInputFormat.getTableSchema(context.getConfiguration);

在Map阶段。。。

value.get('field', schema);

相关问题