hadoop 具有 嵌套 ARRAY & STRUCT 组合 的 serde 表 的 查询 配置 单元

quhf5bfb  于 2022-11-21  发布在  Hadoop
关注(0)|答案(1)|浏览(200)

正在尝试查询基于json数据构建的json配置单元表。使用json 2 Hive可以生成DDL,并可以在删除不必要的字段后创建表。

create external table user_tables.sample_json_table (
  `apps` struct<
    `app`: array<struct<
        `id`: string,
        `queue`: string,
        `finalstatus`: string,
        `trackingurl`: string,
        `applicationtype`: string,
        `applicationtags`: string,
        `startedtime`: string,
        `launchtime`: string,
        `finishedtime`: string,
        `memoryseconds`: string,
        `vcoreseconds`: string,
        `resourcesecondsmap`: struct<
          `entry`: struct<
            `key`: string,
            `value`: string
          >
        >
      >
    >
  >
)
row format serde 'org.apache.hadoop.hive.serde2.JsonSerDe' 
location '/xyz/location/;

现在,一直在试图弄清楚如何从下面的模式中查询每个字段?检查了几篇文章,但它们都是特定于大小写的,需要一个通用的解释或示例来说明如何查询数组/结构下的每个字段:)
我只关心多个“应用程序”子部分条目,并希望将它们导入到另一个表中,每个字段都有单独的字段。
json数据示例:

{"apps":{"app":[{"id":"application_282828282828_12717","user":"xyz","name":"xyz-4b6bdae2-1a0c-4772-bd8e-0d7454268b82","queue":"root.users.dummy","state":"finished","finalstatus":"succeeded","progress":100.0,"trackingui":"history","trackingurl":"http://dang:8088/proxy/application_282828282828_12717/","diagnostics":"session stats:submitteddags=1, successfuldags=1, faileddags=0, killeddags=0\n","clusterid":282828282828,"applicationtype":"aquaman","applicationtags":"ABC,xyz_20221107070124_2beb5d90-24c7-4b1b-b977-3c9af1397195,userid=dummy","priority":0,"startedtime":1667822485626,"launchtime":1667822485767,"finishedtime":1667822553365,"elapsedtime":67739,"amcontainerlogs":"http://dingdong:8042/node/containerlogs/container_e65_282828282828_12717_01_000001/xyz","amhosthttpaddress":"dingdong:8042","amrpcaddress":"dingdong:46457","masternodeid":"dingdong:8041","allocatedmb":-1,"allocatedvcores":-1,"reservedmb":-1,"reservedvcores":-1,"runningcontainers":-1,"memoryseconds":1264304,"vcoreseconds":79,"queueusagepercentage":0.0,"clusterusagepercentage":0.0,"resourcesecondsmap":{"entry":{"key":"memory-mb","value":"1264304"},"entry":{"key":"vcores","value":"79"}},"preemptedresourcemb":0,"preemptedresourcevcores":0,"numnonamcontainerpreempted":0,"numamcontainerpreempted":0,"preemptedmemoryseconds":0,"preemptedvcoreseconds":0,"preemptedresourcesecondsmap":{},"logaggregationstatus":"succeeded","unmanagedapplication":false,"amnodelabelexpression":"","timeouts":{"timeout":[{"type":"lifetime","expirytime":"unlimited","remainingtimeinseconds":-1}]}},{"id":"application_282828282828_12724","user":"xyz","name":"xyz-94962a3e-d230-4fd0-b68b-01b59dd3299d","queue":"root.users.dummy","state":"finished","finalstatus":"succeeded","progress":100.0,"trackingui":"history","trackingurl":"http://dang:8088/proxy/application_282828282828_12724/","diagnostics":"session stats:submitteddags=1, successfuldags=1, faileddags=0, killeddags=0\n","clusterid":282828282828,"applicationtype":"aquaman","applicationtags":"ZZZ_,xyz_20221107070301_e6f788db-e39c-49b6-97d5-6a02ff994c00,userid=dummy","priority":0,"startedtime":1667822585231,"launchtime":1667822585437,"finishedtime":1667822631435,"elapsedtime":46204,"amcontainerlogs":"http://ding:8042/node/containerlogs/container_e65_282828282828_12724_01_000002/xyz","amhosthttpaddress":"ding:8042","amrpcaddress":"ding:46648","masternodeid":"ding:8041","allocatedmb":-1,"allocatedvcores":-1,"reservedmb":-1,"reservedvcores":-1,"runningcontainers":-1,"memoryseconds":5603339,"vcoreseconds":430,"queueusagepercentage":0.0,"clusterusagepercentage":0.0,"resourcesecondsmap":{"entry":{"key":"memory-mb","value":"5603339"},"entry":{"key":"vcores","value":"430"}},"preemptedresourcemb":0,"preemptedresourcevcores":0,"numnonamcontainerpreempted":0,"numamcontainerpreempted":0,"preemptedmemoryseconds":0,"preemptedvcoreseconds":0,"preemptedresourcesecondsmap":{},"logaggregationstatus":"time_out","unmanagedapplication":false,"amnodelabelexpression":"","timeouts":{"timeout":[{"type":"lifetime","expirytime":"unlimited","remainingtimeinseconds":-1}]}},{"id":"application_282828282828_12736","user":"xyz","name":"xyz-1a9c73ef-2992-40a5-aaad-9f0688bb04f4","queue":"root.users.dummy","state":"finished","finalstatus":"succeeded","progress":100.0,"trackingui":"history","trackingurl":"http://dang:8088/proxy/application_282828282828_12736/","diagnostics":"session stats:submitteddags=1, successfuldags=1, faileddags=0, killeddags=0\n","clusterid":282828282828,"applicationtype":"aquaman","applicationtags":"BLAHBLAH,xyz_20221107070609_8d261352-3efa-46c5-a5a0-8a3cd745d180,userid=dummy","priority":0,"startedtime":1667822771170,"launchtime":1667822773663,"finishedtime":1667822820351,"elapsedtime":49181,"amcontainerlogs":"http://dong:8042/node/containerlogs/container_e65_282828282828_12736_01_000001/xyz","amhosthttpaddress":"dong:8042","amrpcaddress":"dong:34266","masternodeid":"dong:8041","allocatedmb":-1,"allocatedvcores":-1,"reservedmb":-1,"reservedvcores":-1,"runningcontainers":-1,"memoryseconds":1300011,"vcoreseconds":89,"queueusagepercentage":0.0,"clusterusagepercentage":0.0,"resourcesecondsmap":{"entry":{"key":"memory-mb","value":"1300011"},"entry":{"key":"vcores","value":"89"}},"preemptedresourcemb":0,"preemptedresourcevcores":0,"numnonamcontainerpreempted":0,"numamcontainerpreempted":0,"preemptedmemoryseconds":0,"preemptedvcoreseconds":0,"preemptedresourcesecondsmap":{},"logaggregationstatus":"succeeded","unmanagedapplication":false,"amnodelabelexpression":"","timeouts":{"timeout":[{"type":"lifetime","expirytime":"unlimited","remainingtimeinseconds":-1}]}},{"id":"application_282828282828_12735","user":"xyz","name":"xyz-d5f56a0a-9c6b-4651-8f88-6eaff5953777","queue":"root.users.dummy","state":"finished","finalstatus":"succeeded","progress":100.0,"trackingui":"history","trackingurl":"http://dang:8088/proxy/application_282828282828_12735/","diagnostics":"session stats:submitteddags=1, successfuldags=1, faileddags=0, killeddags=0\n","clusterid":282828282828,"applicationtype":"aquaman","applicationtags":"HAHAHA_,xyz_20221107070605_a082d9d8-912f-4278-a2ef-5dfe66089fd7,userid=dummy","priority":0,"startedtime":1667822766897,"launchtime":1667822766999,"finishedtime":1667822796759,"elapsedtime":29862,"amcontainerlogs":"http://dung:8042/node/containerlogs/container_e65_282828282828_12735_01_000001/xyz","amhosthttpaddress":"dung:8042","amrpcaddress":"dung:42765","masternodeid":"dung:8041","allocatedmb":-1,"allocatedvcores":-1,"reservedmb":-1,"reservedvcores":-1,"runningcontainers":-1,"memoryseconds":669695,"vcoreseconds":44,"queueusagepercentage":0.0,"clusterusagepercentage":0.0,"resourcesecondsmap":{"entry":{"key":"memory-mb","value":"669695"},"entry":{"key":"vcores","value":"44"}},"preemptedresourcemb":0,"preemptedresourcevcores":0,"numnonamcontainerpreempted":0,"numamcontainerpreempted":0,"preemptedmemoryseconds":0,"preemptedvcoreseconds":0,"preemptedresourcesecondsmap":{},"logaggregationstatus":"succeeded","unmanagedapplication":false,"amnodelabelexpression":"","timeouts":{"timeout":[{"type":"lifetime","expirytime":"unlimited","remainingtimeinseconds":-1}]}}]}}

示例查询输出:

id |  queue |  finalStatus | trackingurl |....
-----------------------------------------------------------
application_282828282828_12717 | root.users.dummy | succeeded | ...
application_282828282828_12724 | root.users.dummy2 | failed | ....
wwtsj6pe

wwtsj6pe1#

对于任何希望执行类似操作的人,我发现这篇文章非常有帮助,有明确的解释:https://community.cloudera.com/t5/Support-Questions/Complex-Json-transformation-using-Hive-functions/m-p/236476
下面是使用LATERAL VIEW EXPLODE解析的查询,以防人们在同一条船上:

select ex1.* from user_tables.sample_json_table cym LATERAL VIEW OUTER inline(cym.apps.app) ex1;
| id                              | queue             | finalstatus | trackingurl                                             | applicationtype | applicationtags                                                                         | startedtime   | launchtime    | finishedtime  | memoryseconds | vcoreseconds | resourcesecondsmap                       |
| ------------------------------- | ----------------- | ----------- | ------------------------------------------------------- | --------------- | --------------------------------------------------------------------------------------- | ------------- | ------------- | ------------- | ------------- | ------------ | ---------------------------------------- |
| application_1667627410794_12717 | root.users.dummy2 | succeeded   | http://dang:8088/proxy/application_1667627410794_12717/ | tez             | \_xyz,test-app-24c7-4b1b-b977-3c9af1397195,userid=dummy1                                | 1667822485626 | 1667822485767 | 1667822553365 | 1264304       | 79           | {"entry":{"key":"vcores","value":"79"}}  |
| application_1667627410794_12724 | root.users.dummy3 | succeeded   | http://dang:8088/proxy/application_1667627410794_12724/ | tez             | \_generate_stuff,hive_20221107070301_e6f788db-e39c-49b6-97d5-6a02ff994c00,userid=dummy3 | 1667822585231 | 1667822585437 | 1667822631435 | 5603339       | 430          | {"entry":{"key":"vcores","value":"430"}} |
| application_1667627410794_12736 | root.users.dummy1 | succeeded   | http://dang:8088/proxy/application_1667627410794_12736/ | tez             | \_sample_job,test-zzz-3efa-46c5-a5a0-8a3cd745d180,userid=dummy1                         | 1667822771170 | 1667822773663 | 1667822820351 | 1300011       | 89           | {"entry":{"key":"vcores","value":"89"}}  |
| application_1667627410794_12735 | root.users.dummy2 | succeeded   | http://dang:8088/proxy/application_1667627410794_12735/ | tez             | \_mixed_article,placebo_2-912f-4278-a2ef-5dfe66089fd7,userid=dummy2                     | 1667822766897 | 1667822766999 | 1667822796759 | 669695        | 44           | {"entry":{"key":"vcores","value":"44"}}  |

添加。注意:虽然我的需求不再需要它,但如果有人能建议如何进一步解析最后一个字段resourcesecondsmap来填充Map键值,那就太好了!基本上使用key值作为字段,value作为字段中的实际值:
所需输出:

| id                              | queue             | finalstatus | trackingurl                                             | applicationtype | applicationtags                                                                         | startedtime   | launchtime    | finishedtime  | memoryseconds | vcoreseconds | vcores-value |
| ------------------------------- | ----------------- | ----------- | ------------------------------------------------------- | --------------- | --------------------------------------------------------------------------------------- | ------------- | ------------- | ------------- | ------------- | ------------ | ------------ |
| application_1667627410794_12717 | root.users.dummy2 | succeeded   | http://dang:8088/proxy/application_1667627410794_12717/ | tez             | \_xyz,test-app-24c7-4b1b-b977-3c9af1397195,userid=dummy1                                | 1667822485626 | 1667822485767 | 1667822553365 | 1264304       | 79           | 79           |
| application_1667627410794_12724 | root.users.dummy3 | succeeded   | http://dang:8088/proxy/application_1667627410794_12724/ | tez             | \_generate_stuff,hive_20221107070301_e6f788db-e39c-49b6-97d5-6a02ff994c00,userid=dummy3 | 1667822585231 | 1667822585437 | 1667822631435 | 5603339       | 430          | 430          |

相关问题