hiveql和xpath-如何提取值和替换某些字符

hc2pp10m  于 2021-06-04  发布在  Hadoop
关注(0)|答案(1)|浏览(423)

我在配置单元日志表中存储了一个xml blob(如下所示)。

<user>
    <uid>1424324325</uid>
    <attribs>
        <field>
        ...
        </field>
        <field>
            <name>first</name>
            <value>Joh,n</value>
        </field>
        <field>
        ...
        </field>
        <field>
            <name>last</name>
            <value>D,oe</value>
        </field>
        <field>
        ...
        </field>
    </attribs>
</user>

配置单元表中的每一行都有关于不同用户的信息,我想提取uid、first name和last name的值(从名称中删除任何逗号)。

1424324325  John    Doe
1424435463  Jane    Smith

我能够从xml中提取值。

SELECT uid, fn, ln
FROM log_table
LATERAL VIEW explode(xpath(logs['users_updates'], '/user/uid/text()')) uids as uid
LATERAL VIEW explode(xpath(logs['users_updates'], '/user/attribs/field[name = "first_name"]/value/text()')) fns as fn
LATERAL VIEW explode(xpath(logs['users_updates'], '/user/attribs/field[name = "last_name"]/value/text()')) lns as ln;

然而,我在试图从first name和last name中删除不必要的逗号(如果存在的话)时遇到了麻烦。
当我尝试使用下面显示的任何方法提取名字时,结果是空的。

LATERAL VIEW explode(xpath(logs['users_updates'], '/users/attribs/field[name = "first_name"]/value/replace(text(),",","")')) fns as fn

LATERAL VIEW explode(xpath(logs['users_updates'], '/users/attribs/field[name = "first_name"]/value/translate(text(),",","")')) fns as fn

当我按如下所示尝试时,replace会抱怨函数无效,而translate会在不删除多余逗号的情况下提取数据。

LATERAL VIEW explode(xpath(logs['users_updates'], replace('/subscriberUpdates/updates/field[name = "first_name"]/value/text()',",",""))) fns as fn

LATERAL VIEW explode(xpath(logs['users_updates'], translate('/subscriberUpdates/updates/field[name = "first_name"]/value/text()',",",""))) fns as fn

如果名称值中没有逗号,如何提取信息?

1424324325  John    Doe
1424435463  Jane    Smith

最终解决方案:这是jens建议之后的最终工作查询

SELECT uid, regexp_replace(fn,","," ") as fname, regexp_replace(ln,","," ") as lname
FROM log_table
LATERAL VIEW explode(xpath(logs['users_updates'], '/user/uid/text()')) uids as uid
LATERAL VIEW explode(xpath(logs['users_updates'], '/user/attribs/field[name = "first_name"]/value/text()')) fns as fn
LATERAL VIEW explode(xpath(logs['users_updates'], '/user/attribs/field[name = "last_name"]/value/text()')) lns as ln;
dwthyt8l

dwthyt8l1#

配置单元中不支持xpath 2.0。这会影响你的问题两次:
不允许在axis步骤中调用函数。而 //value/translate(text(), ',', '') (这需要为每个 <value/> 元素)是有效的xpath 2.0,不能在xpath 1.0中执行此操作。 translate(//value, ',', '') 另一方面,返回所有 <value/> 作为单个字符串连接的项。
根本没有 replace XPath1.0中的函数。
传递包含逗号的值并在配置单元中执行字符串操作可能更容易。
另请注意,因为您还没有XPath2.0: translate 只需要一个字符串作为第一个参数。你需要 string-join 以前是这样的。

相关问题