shell—在将数据插入配置单元表之前加载xml文件并Map给定的列

mmvthczy  于 2021-05-27  发布在  Spark
关注(0)|答案(0)|浏览(240)

我想将xml文件加载到配置单元列中,但在此之前,我需要使用给定的Map值对一些字段进行Map。
例子:
我有这样一个xml文件:

<?xml version="1.0"?>
<Company>
  <Employee>
      <FirstName>Test1</FirstName>
      <LastName>toto1</LastName>
      <ContactNo>111</ContactNo>
      <Email>toto1@xyz.com</Email>
      <Address>
           <City>Bangalore</City>
           <State>Karnataka</State>
           <Zip>560212</Zip>
      </Address>
  </Employee>
    <Employee>
      <FirstName>Test2</FirstName>
      <LastName>toto2</LastName>
      <ContactNo>222</ContactNo>
      <Email>toto2@xyz.com</Email>
      <Address>
           <City>Bangalore</City>
           <State>Karnataka</State>
           <Zip>545454</Zip>
      </Address>
  </Employee>
    <Employee>
      <FirstName>Test3</FirstName>
      <LastName>toto3</LastName>
      <ContactNo>333</ContactNo>
      <Email>toto3@xyz.com</Email>
      <Address>
           <City>Bangalore</City>
           <State>Karnataka</State>
           <Zip>36363</Zip>
      </Address>
  </Employee>
</Company>

我这样做是为了在Hive中加载文件,它为我工作。
添加jar${projet}/datagaps\u xml\u traite/hivexmlserde-1.0.5.3.jar;
删除表(如果存在);

CREATE external TABLE Employee(
(FirstName STRING,LastName STRING,ContactNo STRING,Email STRING,City STRING,State STRING,Zip STRING)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES ( 
"column.xpath.FirstName"="/Employee/FirstName/text()",
"column.xpath.LastName"="//Employee/LastName/text()",
"column.xpath.ContactNo"="/Employee/ContactNo/text()",
"column.xpath.City"="/Employee/Address/City/text()",
"column.xpath.State"="/Employee/Address/State/text()",
"column.xpath.Zip"="/Employee/Address/Zip/text()",
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION '/data/myFile.xml/'
TBLPROPERTIES (
"xmlinput.start"="<Employee",
"xmlinput.end"="</Employee>" ) 
;

现在,我有一张这样的Map:

contactNo,identif
111,       XXX
222,       YYY
333,       ZZZ

我想Map每个contactno及其identif,并在配置单元中插入identif值。
有人能指导我解决这个问题吗。

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题