从现有的avro文件夹创建一个配置单元表

0ejtzxu1  于 2021-06-02  发布在  Hadoop
关注(0)|答案(0)|浏览(254)

我在文件夹中有一系列avro文件夹: /gobblin 在我的hdfs里。我手动创建了一个 avsc 文件基于我所知道的 avro .
如何创建 hiveavsc 使用现有的 avro 文件已在中 hdfs ?
谢谢您。
更新#1,我已创建 CREATE TABLE 创建配置单元表的脚本:

CREATE EXTERNAL TABLE Claims(
     PlanID int,
     ClaimID int,
     ClaimAmount int,
     PhysicianID string,
     ClaimType string,
     CreateDate string,
     ModifyDate string)
ROW FORMAT
SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION 'hdfs://hostname.com/gobblin/job-output/Claims/Claims'
TBLPROPERTIES ('avro.schema.url'='hdfs://hostname.com/gobblin/claims.avsc');

ddl可以工作,但是我不能查询。内部 /gobblin/job-output/Claims/Claims 文件夹包含一系列序列化文件夹,每个文件夹中都有avro文件。我要他们在table上。我该怎么做?
谢谢。
更新#2这是我的avsc文件:

{"namespace": "claim.avro",
 "type": "record",
 "name": "claim",
 "fields": [
     {"name": "MemberID", "type": "int"},
     {"name": "PlanID", "type": "int"},
     {"name": "ClaimID", "type": "int"},
     {"name": "ClaimAmount", "type": "int"},
     {"name": "PhysicianID", "type": ["string", "null"]},
     {"name": "ClaimType", "type": ["string", "null"]},
     {"name": "CreateDate", "type": ["string", "null"]},
     {"name": "ModifyDate", "type": ["string", "null"]}
 ]
}

下面是堆栈跟踪:

Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1449174649821_0008_3_00, diagnostics=[Task failed, taskId=task_1449174649821_0008_3_00_000001, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: org.apache.avro.AvroTypeException: Found Claims.Claims, expecting Claim.avro.Claim
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
        at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: org.apache.avro.AvroTypeException: Found Claims.Claims, expecting Claim.avro.Claim
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:71)
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:310)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
        ... 14 more
Caused by: java.io.IOException: org.apache.avro.AvroTypeException: Found Claims.Claims, expecting Claim.avro.Claim
        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
        at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355)
        at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
        at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
        at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
        at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:141)
        at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:61)
        ... 16 more
Caused by: org.apache.avro.AvroTypeException: Found Claims.Claims, expecting Claim.avro.Claim
        at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:231)
        at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
        at org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:127)
        at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:176)
        at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
        at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
        at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
        at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)
        at org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.next(AvroGenericRecordReader.java:153)
        at org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.next(AvroGenericRecordReader.java:52)
        at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
        ... 22 more

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题