当使用json serde读取json文件时,我只得到一行

lnlaulya  于 2021-06-24  发布在  Hive
关注(0)|答案(1)|浏览(409)

json数据:

[{
  "liked": "true",
  "user_id": "101",
  "video_end_type": "3",
  "minutes_played": "3",
  "video_id": "101",
  "geo_cd": "AP",
  "channel_id": "11",
  "creator_id": "101",
  "timestamp": "07/05/2019 01:36:35",
  "disliked": "true"
},
{
"liked": true, 
"user_id": 102, 
"video_end_type": null,
 "minutes_played": 4,
 "video_id": 102, 
"geo_cd": "AP",
 "channel_id": 12,
 "creator_id": 102,
 "timestamp": "15/04/2019 17:04:00", 
"disliked": true
}
]

输出:

hive> select * from stream;
OK
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
true    101     3       3       101     AP      11      101     NULL    true
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL

我得到的输出只是第一行。
代码:

add jar json-serde-1.3.8-jar-with-dependencies.jar;

create external table stream(
    liked string, user_id int, video_end_type int, minutes_played int, video_id int, geo_cd string, channel_id int, creator_id int, time timestamp, disliked 
string)
    ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
    LOCATION '/location';

注意:我不能删除括号,因为数据不是正确的json格式。

pbwdgjma

pbwdgjma1#

类似的方法也适用于你:
json数据:

{"liked": "true",  "user_id": "101",  "video_end_type": "3",  "minutes_played": "3",  "video_id": "101",  "geo_cd": "AP",  "channel_id": "11",  "creator_id": "101",  "timestamp": "07/05/2019 01:36:35",  "disliked": "true"}
{"liked": true, "user_id": 102, "video_end_type": null, "minutes_played": 4, "video_id": 102, "geo_cd": "AP", "channel_id": 12, "creator_id": 102, "timestamp": "15/04/2019 17:04:00", "disliked": true}

您的创建查询应如下所示:

create external table stream(
liked string,
user_id string,
video_end_type string,
minutes_played string,
video_id string,
geo_cd string,
channel_id string,
creator_id string,
`timestamp` string,
disliked string
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS TEXTFILE
LOCATION '/location/';

注意:如果json数据是一致的,那么可以使用适当的数据类型而不是字符串。

相关问题