我在配置单元日志表中存储了一个xml blob(如下所示)。
<user>
<uid>1424324325</uid>
<attribs>
<field>
...
</field>
<field>
<name>first</name>
<value>Joh,n</value>
</field>
<field>
...
</field>
<field>
<name>last</name>
<value>D,oe</value>
</field>
<field>
...
</field>
</attribs>
</user>
配置单元表中的每一行都有关于不同用户的信息,我想提取uid、first name和last name的值(从名称中删除任何逗号)。
1424324325 John Doe
1424435463 Jane Smith
我能够从xml中提取值。
SELECT uid, fn, ln
FROM log_table
LATERAL VIEW explode(xpath(logs['users_updates'], '/user/uid/text()')) uids as uid
LATERAL VIEW explode(xpath(logs['users_updates'], '/user/attribs/field[name = "first_name"]/value/text()')) fns as fn
LATERAL VIEW explode(xpath(logs['users_updates'], '/user/attribs/field[name = "last_name"]/value/text()')) lns as ln;
然而,我在试图从first name和last name中删除不必要的逗号(如果存在的话)时遇到了麻烦。
当我尝试使用下面显示的任何方法提取名字时,结果是空的。
LATERAL VIEW explode(xpath(logs['users_updates'], '/users/attribs/field[name = "first_name"]/value/replace(text(),",","")')) fns as fn
LATERAL VIEW explode(xpath(logs['users_updates'], '/users/attribs/field[name = "first_name"]/value/translate(text(),",","")')) fns as fn
当我按如下所示尝试时,replace会抱怨函数无效,而translate会在不删除多余逗号的情况下提取数据。
LATERAL VIEW explode(xpath(logs['users_updates'], replace('/subscriberUpdates/updates/field[name = "first_name"]/value/text()',",",""))) fns as fn
LATERAL VIEW explode(xpath(logs['users_updates'], translate('/subscriberUpdates/updates/field[name = "first_name"]/value/text()',",",""))) fns as fn
如果名称值中没有逗号,如何提取信息?
1424324325 John Doe
1424435463 Jane Smith
最终解决方案:这是jens建议之后的最终工作查询
SELECT uid, regexp_replace(fn,","," ") as fname, regexp_replace(ln,","," ") as lname
FROM log_table
LATERAL VIEW explode(xpath(logs['users_updates'], '/user/uid/text()')) uids as uid
LATERAL VIEW explode(xpath(logs['users_updates'], '/user/attribs/field[name = "first_name"]/value/text()')) fns as fn
LATERAL VIEW explode(xpath(logs['users_updates'], '/user/attribs/field[name = "last_name"]/value/text()')) lns as ln;
1条答案
按热度按时间dwthyt8l1#
配置单元中不支持xpath 2.0。这会影响你的问题两次:
不允许在axis步骤中调用函数。而
//value/translate(text(), ',', '')
(这需要为每个<value/>
元素)是有效的xpath 2.0,不能在xpath 1.0中执行此操作。translate(//value, ',', '')
另一方面,返回所有<value/>
作为单个字符串连接的项。根本没有
replace
XPath1.0中的函数。传递包含逗号的值并在配置单元中执行字符串操作可能更容易。
另请注意,因为您还没有XPath2.0:
translate
只需要一个字符串作为第一个参数。你需要string-join
以前是这样的。