xpath

6uxekuva  于 2023-05-22  发布在  Hive
关注(0)|答案(3)|浏览(298)

我有下面的xml

<qr>
    <Trade xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ss="http://www.mycomp.com/mycall/schema/1/durables/ss" xmlns:ss-raw="http://www.mycomp.com/api.dsl/tm/2/ss-raw/v1.0">
        <TradeId>
            <ss:SYSTEMID>1466413528</ss:SYSTEMID>
        </TradeId>
        <InstrumentId xsi:nil="true">test_instrument</InstrumentId>
        <TraderSourceSystemName xsi:nil="true">akjsdfklas</TraderSourceSystemName>
    </Trade>    
</qr>

我正在尝试使用

CREATE EXTERNAL TABLE sample(TradeId STRING,
    InstrumentId STRING,
    TraderSourceSystemName STRING
    )
    ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
    WITH SERDEPROPERTIES (
    "column.xpath.TradeId"="Trade/TradeId",
    "column.xpath.InstrumentId"="Trade/InstrumentId/text()",
      "column.xpath.TraderSourceSystemName"="Trade/TraderSourceSystemName/text()"
    )
    STORED AS
    INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
    LOCATION "hdfs://server:port/user/sl/sample/"
    TBLPROPERTIES (

  "xmlinput.start"="<Trade xmlns",
  "xmlinput.end"="</Trade>"
      );

当我从表中选择*时,第一个字段如下所示

<pre>
<TradeId><ss:SYSTEMID xmlns:ss="...namespace...">1466413528</ss:SYSTEMID></TradeId>

如何使用行业id- 1466413528

xxb16uws

xxb16uws1#

如何在此xml的op上创建配置单元表。

<root>
<root1>
<id>4545482361</id>`enter code here`
<joiningdate>1/3/2010</joiningdate>
<Segments>
<Segment xse:type="manager">
<cityworked>Hyd</cityworked>
<reports>john</reports>
<salary>150000</salary>
<datestarted>1/3/2012</datestarted>
</Segment>
<Segment xse:type="manager">
<cityworked>Hyd</cityworked>
<reports>mike</reports>
<salary>225000</salary>
<datestarted>1/9/2014</datestarted>
</Segment>
<Segment xse:type="VP">
<cityworked>mumbai</cityworked>
<datestarted>1/9/2014</datestarted>
<subemployees>
<Fname>ram</Fname>
<Lname>Achanta</Lname>
<Desgination>Director of IT</Desgination>
</subemployees>
</Segment>
<Segment xse:type="SVP">
<Staus>currentposition</status>
<numberofemployees>10</numberofemployees>
</Segment>
</Segments>
</root1>
</root>
4urapxun

4urapxun2#

使用xpath //Trade/TradeId/ss:SYSTEMID/text() 为了 column.xpath.TradeId

z0qdvdin

z0qdvdin3#

"column.xpath.TradeId" = "Trade/TradeId/*[local-name(.)='SYSTEMID']/text()"
create external table sample
(
    tradeid                 string
   ,instrumentid            string
   ,tradersourcesystemname  string
)
row format serde 'com.ibm.spss.hive.serde2.xml.XmlSerDe'

with serdeproperties 
(
    "column.xpath.TradeId"                  = "Trade/TradeId/*[local-name(.)='SYSTEMID']/text()"
   ,"column.xpath.InstrumentId"             = "Trade/InstrumentId/text()"
   ,"column.xpath.TraderSourceSystemName"   = "Trade/TraderSourceSystemName/text()"
)

stored as
inputformat     'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
outputformat    'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'

tblproperties 
(
    "xmlinput.start"    = "<Trade xmlns"
   ,"xmlinput.end"      = "</Trade>"
)
;
select * from sample
;
+------------+-----------------+------------------------+
|  tradeid   |  instrumentid   | tradersourcesystemname |
+------------+-----------------+------------------------+
| 1466413528 | test_instrument | akjsdfklas             |
+------------+-----------------+------------------------+

局限性
...
目前只支持XPath1.0规范。元素和属性的限定名的本地部分在处理配置单元字段名时使用。命名空间前缀将被忽略。
https://github.com/dvasilen/hive-xml-serde/wiki/xml-data-sources

相关问题