配置单元从日志解析字符串

2guxujil  于 2021-06-26  发布在  Hive
关注(0)|答案(1)|浏览(459)

解析日志文件中的字符串时遇到问题,情况如下:

"skey":"110","scp_id":"OC05","capedge":"3G"
"skey":"140","scp_id":"OC02","capedge":"3G"
"skey":"0","scp_id":"OC01","capedge":"3G"

这是我们表的预期输出

|   skey    |   scp_id  |   capedge |
|   110     |   OC05    |   3G      |
|   140     |   OC02    |   3G      |
|   0       |   OC01    |   3G      |

我试过从https://cwiki.apache.org/confluence/display/hive/languagemanual+udf 但不幸的是,我们的字符串不是url格式,有没有更好的方法?或者我必须使用regexp\u提取吗?
谢谢你,加利

b1zrtrql

b1zrtrql1#

你可以使用 SPLIT 功能和 REGEXP_EXTRACT ```
select REGEXP_EXTRACT( skey , ':"(\w+)"', 1) as skey,
REGEXP_EXTRACT( scp_id , ':"(\w+)"', 1) as scp_id,
REGEXP_EXTRACT( capedge , ':"(\w+)"', 1) as capedge
from (
select SPLIT(log_record, ',' )[0] as skey,
SPLIT(log_record , ',')[1] as scp_id,
SPLIT( log_record , ',')[2] as capedge
FROM yourtable
) a;

色调演示:用户id,密码:演示,演示

相关问题