使用平展字段从嵌套的json数据创建配置单元表

bn31dyow  于 2021-06-27  发布在  Hive
关注(0)|答案(1)|浏览(372)

我想从嵌套的json数据创建外部配置单元表,但是字段应该从嵌套的json中展平。
为了example:-

{

    "key1":"value1",
    "key2":{
        "nestedKey1":1,
        "nestedKey2":2
    }

}

配置单元表的格式或字段应该像
key1:string,key2.nestedkey1:int,key2.nestedkey1:int
提前谢谢

ifmq2ha2

ifmq2ha21#

使用 JsonSerDe 并使用以下语法创建表:

hive> create table sample(key1 string,key2 struct<nestedKey1:int,nestedKey2:int>) 
      ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe';

hive> select key1,key2.nestedkey1,key2.nestedkey2 from sample;
+---------+-------------+-------------+--+
|  key1   | nestedkey1  | nestedkey2  |
+---------+-------------+-------------+--+
| value1  | 1           | 2           |
+---------+-------------+-------------+--+

hive> select * from sample;
+--------------+----------------------------------+--+
| sample.key1  |           sample.key2            |
+--------------+----------------------------------+--+
| value1       | {"nestedkey1":1,"nestedkey2":2}  |
+--------------+----------------------------------+--+

(或)
如果你想用“展平”创建表格 json fields 然后使用 RegexSerDe 匹配regex从数据中提取nestedkey。
有关regex serde的更多详细信息,请参阅此链接。
更新:
输入数据:

{"key1":"value1","key2":{"nestedKey1":1,"nestedKey2":2}}

Hive:

hive> CREATE  TABLE dd (key1 string, nestedKey1 string, nestedKey2 string) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' 
WITH SERDEPROPERTIES 
('input.regex'=".*:\"(.*?)\",\"key2\":\\{\"nestedKey1\":(\\d),\"nestedKey2\":(\\d).*$");

从表中选择数据:

hive>  select * from dd;
+---------+-------------+-------------+--+
|  key1   | nestedkey1  | nestedkey2  |
+---------+-------------+-------------+--+
| value1  | 1           | 2           |
+---------+-------------+-------------+--+

相关问题