我想将xml文件加载到配置单元列中,但在此之前,我需要使用给定的Map值对一些字段进行Map。
例子:
我有这样一个xml文件:
<?xml version="1.0"?>
<Company>
<Employee>
<FirstName>Test1</FirstName>
<LastName>toto1</LastName>
<ContactNo>111</ContactNo>
<Email>toto1@xyz.com</Email>
<Address>
<City>Bangalore</City>
<State>Karnataka</State>
<Zip>560212</Zip>
</Address>
</Employee>
<Employee>
<FirstName>Test2</FirstName>
<LastName>toto2</LastName>
<ContactNo>222</ContactNo>
<Email>toto2@xyz.com</Email>
<Address>
<City>Bangalore</City>
<State>Karnataka</State>
<Zip>545454</Zip>
</Address>
</Employee>
<Employee>
<FirstName>Test3</FirstName>
<LastName>toto3</LastName>
<ContactNo>333</ContactNo>
<Email>toto3@xyz.com</Email>
<Address>
<City>Bangalore</City>
<State>Karnataka</State>
<Zip>36363</Zip>
</Address>
</Employee>
</Company>
我这样做是为了在Hive中加载文件,它为我工作。
添加jar${projet}/datagaps\u xml\u traite/hivexmlserde-1.0.5.3.jar;
删除表(如果存在);
CREATE external TABLE Employee(
(FirstName STRING,LastName STRING,ContactNo STRING,Email STRING,City STRING,State STRING,Zip STRING)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.FirstName"="/Employee/FirstName/text()",
"column.xpath.LastName"="//Employee/LastName/text()",
"column.xpath.ContactNo"="/Employee/ContactNo/text()",
"column.xpath.City"="/Employee/Address/City/text()",
"column.xpath.State"="/Employee/Address/State/text()",
"column.xpath.Zip"="/Employee/Address/Zip/text()",
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION '/data/myFile.xml/'
TBLPROPERTIES (
"xmlinput.start"="<Employee",
"xmlinput.end"="</Employee>" )
;
现在,我有一张这样的Map:
contactNo,identif
111, XXX
222, YYY
333, ZZZ
我想Map每个contactno及其identif,并在配置单元中插入identif值。
有人能指导我解决这个问题吗。
暂无答案!
目前还没有任何答案,快来回答吧!