我正在处理使用nifi从facebook获得的嵌套json数据。我已经在配置单元中创建了一个表,并使用命令加载了数据。
CREATE TABLE abmediaanalysis (id string, posts struct< data:array<struct< message:string, shares:struct< count:int>, id:string, reactions:struct< data:array<struct< name:string,id:string>>, paging:struct< cursors:struct< before:string,after:string>, next:string>>, likes:struct< data:array<struct< id:string>>, paging:struct< cursors:struct< before:string,after:string>, next:string>>>>, paging:struct< previous:string,next:string>>, feed struct< data:array<struct< permalink_url:string,message:string,id:string>>, paging:struct< previous:string,next:string>>)ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';
load data local inpath '/home/10879/facebook1480479682880.json' overwrite into table abmediaanalysis;
我还添加了jar文件json-serde-1.3.8-jar-with-dependencies.jar,但是当我使用横向视图explode打印所有列时,得到了java堆大小错误。我也增加了堆的大小,但仍然有相同的错误
select id,posts_message,posts_share_count,posts_id,feed_data_permalink_url,feed_data_message,feed_data_id,reaction_data_name,reaction_data_id,posts_likes_data_id from abmediaanalysis
LATERAL VIEW explode(posts.data.message)MSG as posts_message
LATERAL VIEW explode(posts.data.shares.count)CT as posts_share_count
LATERAL VIEW explode(posts.data.id) I as posts_id
LATERAL VIEW explode(feed.data.permalink_url) PU as feed_data_permalink_url
LATERAL VIEW explode(feed.data.message) MSG as feed_data_message
LATERAL VIEW explode(feed.data.id) I as feed_data_id
LATERAL VIEW explode(posts.data.reactions) NM as posts_reactions_name
LATERAL VIEW explode(posts_reactions_name.data.name) NM as reaction_data_name
LATERAL VIEW explode(posts_reactions_name.data.id) NM as reaction_data_id
LATERAL VIEW explode(posts.data.likes) I as likes_data_id
LATERAL VIEW explode(likes_data_id.data.id) I as posts_likes_data_id;
当我试图打印两到三列而不是显示616条记录时,它显示了15625条记录。有人能帮助解决这个问题吗?有没有机会直接从nifi将上面的json数据加载到hive表中?如果是,你能告诉我吗。
提前谢谢
1条答案
按热度按时间nkkqxpd91#
如果我可以建议你另一种方法来加载整个
JSON
将字符串作为String
外部表中的数据类型。例如
使用Hive
get_json_object
提取单个列。例如如果json\u数据列有以下json字符串
下面的查询获取
退货
amy
这样你就可以提取每一个json
元素作为表中的列。