我试图在pig中使用hcatalog加载我的配置单元表,因为我已经编写了下面的代码,但是我得到了一个错误。我正在打开我的Pig壳 pig -useHCatalog
代码:
A = LOAD 'patient_info' USING org.apache.hive.hcatalog.pig.HCatLoader();
错误:
错误hive.ql.metadata.table-无法从serde获取字段:com.ibm.spss.hive.serde2.xml.xmlserde java.lang.runtimeexception:metaexception(message:java.lang.classnotfoundexception class com.ibm.spss.hive.serde2.xml.xmlserde未找到),位于org.apache.hadoop.hive.ql.metadata.table.getdeserializerfrommetastore(表。java:275)在org.apache.hadoop.hive.ql.metadata.table.getdeserializer(表。java:255)在org.apache.hadoop.hive.ql.metadata.table.getcols(表。java:602)在org.apache.hive.hcatalog.common.hcatutil.gettableschemawithptncols(hcatutil。java:184)在org.apache.hive.hcatalog.pig.hcatloader.getschema(hcatloader。java:216)在org.apache.pig.newplan.logical.relational.loload.getschemafrommetadata(loload。java:175)在org.apache.pig.newplan.logical.relational.loload。java:89)位于org.apache.pig.parser.logicalplanbuilder.buildloadop(logicalplanbuilder)。java:866)在org.apache.pig.parser.logicalplangerator.load\子句(logicalplangerator。java:3568)在org.apache.pig.parser.logicalplangerator.op\子句(logicalplangerator。java:1625)在org.apache.pig.parser.logicalplangerator.general\语句(logicalplangerator。java:1102)位于org.apache.pig.parser.logicalplangerator.statement(logicalplangerator。java:560)在org.apache.pig.parser.logicalplangerator.query(logicalplangerator。java:421)在org.apache.pig.parser.queryparserdriver.parse(queryparserdriver。java:188)在org.apache.pig.pigserver$graph.parsequery(pigserver。java:1688)在org.apache.pig.pigserver$graph.registerquery(pigserver。java:1635)在org.apache.pig.pigserver.registerquery(pigserver。java:587)在org.apache.pig.tools.grunt.gruntparser.processpig(gruntparser。java:1093)在org.apache.pig.tools.pigscript.parser.pigscriptparser.parse(pigscriptparser。java:501)在org.apache.pig.tools.grunt.gruntparser.parsestoponerror(gruntparser。java:198)在org.apache.pig.tools.grunt.gruntparser.parsestoponerror(gruntparser。java:173)在org.apache.pig.tools.grunt.grunt.run(grunt。java:69)在org.apache.pig.main.run(main。java:547)在org.apache.pig.main.main(main。java:158)位于sun.reflect.nativemethodaccessorimpl.invoke0(本机方法)sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl。java:57)在sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl。java:43)在java.lang.reflect.method.invoke(方法。java:606)在org.apache.hadoop.util.runjar.run(runjar。java:221)在org.apache.hadoop.util.runjar.main(runjar。java:136)原因:元异常(message:java.lang.classnotfoundexception class com.ibm.spss.hive.serde2.xml.xmlserde未找到),位于org.apache.hadoop.hive.metastore.metastoreutils.getdeserializer(metastoreutils)。java:400)
更新:
下面给出了我在hive中存储数据的命令。
add jar /home/cloudera/hivexmlserde-1.0.5.3.jar;
CREATE EXTERNAL TABLE patient_info (
statusCode string,
title string,
startTime string,
endTime string,
frequencyValue string,
frequencyUnits string
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.statusCode"="medicationsInfo/entryInfo/statusCode/text()",
"column.xpath.title"="medications/code/code/text()",
"column.xpath.startTime"="medications/xxx/startTime/text()",
"column.xpath.endTime"="medications/xxx/endTime/text()",
"column.xpath.frequencyValue"="medications/xxx/frequencyValue/text()",
"column.xpath.frequencyUnits"="medications/xxx/frequencyUnits/text()",
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
TBLPROPERTIES (
"xmlinput.start"="<medicationsInfo",
"xmlinput.end"="</medicationsInfo>");
load data inpath '/user/cloudera/xml' into table patient_info ;
样品:
<Document>
<ProductCode>
<code>10160-0</code>
<entryInfo>
<statusCode>completed</statusCode>
<startTime>20110729</startTime>
<endTime>20110822</endTime>
<strengthValue>24</strengthValue>
<strengthUnits>h</strengthUnits>
</entryInfo>
<entryInfo>
<statusCode>completed</statusCode>
<startTime>20120130</startTime>
<endTime>20120326</endTime>
<strengthValue>12</strengthValue>
<strengthUnits>h</strengthUnits>
</entryInfo>
<entryInfo>
<statusCode>completed</statusCode>
<startTime>20100412</startTime>
<endTime>20110822</endTime>
<strengthValue>8</strengthValue>
<strengthUnits>d</strengthUnits>
</entryInfo>
</ProductCode>
<ProductCode>
<code>10160-0</code>
<entryInfo>
<statusCode>completed</statusCode>
<startTime>20110729</startTime>
<endTime>20110822</endTime>
<strengthValue>24</strengthValue>
<strengthUnits>h</strengthUnits>
</entryInfo>
<entryInfo>
<statusCode>completed</statusCode>
<startTime>20120130</startTime>
<endTime>20120326</endTime>
<strengthValue>12</strengthValue>
<strengthUnits>h</strengthUnits>
</entryInfo>
<entryInfo>
<statusCode>completed</statusCode>
<startTime>20100412</startTime>
<endTime>20110822</endTime>
<strengthValue>8</strengthValue>
<strengthUnits>d</strengthUnits>
</entryInfo>
</ProductCode>
<Medicationsinfo>
<code>10160-0</code>
<entryInfo>
<statusCode>completed</statusCode>
<startTime>20110729</startTime>
<endTime>20110822</endTime>
<strengthValue>24</strengthValue>
<strengthUnits>h</strengthUnits>
</entryInfo>
<entryInfo>
<statusCode>completed</statusCode>
<startTime>20120130</startTime>
<endTime>20120326</endTime>
<strengthValue>12</strengthValue>
<strengthUnits>h</strengthUnits>
</entryInfo>
<entryInfo>
<statusCode>completed</statusCode>
<startTime>20100412</startTime>
<endTime>20110822</endTime>
<strengthValue>8</strengthValue>
<strengthUnits>d</strengthUnits>
</entryInfo>
</Medicationsinfo>
<Medicationsinfo>
<code>10160-0</code>
<entryInfo>
<statusCode>completed</statusCode>
<startTime>20110729</startTime>
<endTime>20110822</endTime>
<strengthValue>24</strengthValue>
<strengthUnits>h</strengthUnits>
</entryInfo>
<entryInfo>
<statusCode>completed</statusCode>
<startTime>20120130</startTime>
<endTime>20120326</endTime>
<strengthValue>12</strengthValue>
<strengthUnits>h</strengthUnits>
</entryInfo>
<entryInfo>
<statusCode>completed</statusCode>
<startTime>20100412</startTime>
<endTime>20110822</endTime>
<strengthValue>8</strengthValue>
<strengthUnits>d</strengthUnits>
</entryInfo>
</Medicationsinfo>
</Document>
1条答案
按热度按时间xuo3flqw1#
外部表的定义无效。以下是一些选项:
方案1
方案2
方案3
分解选项3