如何在hive中使用serde上传twitter json数据?

wswtfjt7  于 2021-06-02  发布在  Hadoop
关注(0)|答案(2)|浏览(363)

我使用twitter数据加载到hive中,并对其执行一些查询:我的tweeter数据(raw)是:(仅一种格式)-

{"created_at":"Tue Apr 28 23:28:15 +0000 2015","id":593195048306610176,"id_str":"593195048306610176","text":"Apple watch now has Tinder integration, now you can swipe on the go. This is revolutionary.","source":"\u003ca href=\"http:\/\/twitter.com\/download\/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":56632588,"id_str":"56632588","name":"Farmer Mike South","screen_name":"HunterPachell","location":"Bowling Green","url":"http:\/\/pornhub.com","description":"\u0394T\u0394 Bowling Green State University '16 BGSU Lax #2 See my latest highlights on http:\/\/pornhub.com","protected":false,"verified":false,"followers_count":439,"friends_count":997,"listed_count":1,"favourites_count":4548,"statuses_count":3702,"created_at":"Tue Jul 14 07:05:51 +0000 2009","utc_offset":-25200,"time_zone":"Pacific Time (US & Canada)","geo_enabled":true,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"050005","profile_background_image_url":"http:\/\/pbs.twimg.com\/profile_background_images\/344918034410158087\/38851478822519fa3c9f5d50284b00d4.jpeg","profile_background_image_url_https":"https:\/\/pbs.twimg.com\/profile_background_images\/344918034410158087\/38851478822519fa3c9f5d50284b00d4.jpeg","profile_background_tile":false,"profile_link_color":"000000","profile_sidebar_border_color":"FFFFFF","profile_sidebar_fill_color":"95E8EC","profile_text_color":"3C3940","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/593108136317300738\/tf4W1APu_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/593108136317300738\/tf4W1APu_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/56632588\/1420260655","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"trends":[],"urls":[],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en","timestamp_ms":"1430263695204"}

我正在使用外部表配置单元架构:

CREATE  External TABLE tweets (
  id BIGINT,
  created_at STRING,
  source STRING,
  favorited BOOLEAN,
retweet_count INT,
  retweeted_status STRUCT<
    text:STRING,
    user:STRUCT<screen_name:STRING,name:STRING>,
    retweet_count:INT>,
  entities STRUCT<
    urls:ARRAY<STRUCT<expanded_url:STRING>>,
    user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
    hashtags:ARRAY<STRUCT<text:STRING>>>,
  text STRING,
  user STRUCT<
    screen_name:STRING,
    name:STRING,
    friends_count:INT,
    followers_count:INT,
    statuses_count:INT,
    verified:BOOLEAN,
    utc_offset:INT,
    time_zone:STRING>,
  in_reply_to_screen_name STRING
) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
LOCATION '/user/hastimal/tweets';

但是当我做一些简单的事情比如
从tweets limit 1中选择*;
显示错误:

hive> select * from tweets limit 1;

OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.json.JSONObject cannot be cast to [Ljava.lang.Object;

另外,我尝试了google/stackoverflow中的所有可用功能,发现了一些东西,可以将linkedinjson响应加载到hive中
但不起作用。请帮忙。。。。。

lb3vh1jj

lb3vh1jj1#

我觉得jar有问题。从下面的链接下载hiveserdejar

files.cloudera.com/samples/hive-serdes-1.0-SNAPSHOT.jar

添加jar文件

hive>  add jar  hive-serdes-1.0-SNAPSHOT.jar

----创建表---行格式serde'com.cloudera.hive.serde.jsonserde'如下

CREATE  External TABLE tweetsjson (
 id BIGINT,
 created_at STRING,
source STRING,
favorited BOOLEAN,
retweet_count INT,
retweeted_status STRUCT<
text:STRING,
user:STRUCT<screen_name:STRING,name:STRING>,
retweet_count:INT>,
entities STRUCT<
urls:ARRAY<STRUCT<expanded_url:STRING>>,
user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
hashtags:ARRAY<STRUCT<text:STRING>>>,
text STRING,
user STRUCT<
screen_name:STRING,
name:STRING,
friends_count:INT,
followers_count:INT,
statuses_count:INT,
verified:BOOLEAN,
utc_offset:INT,
time_zone:STRING>,
in_reply_to_screen_name STRING
) 
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
LOCATION '/user/hastimal/tweets';

如果不行就告诉我

prdp8dxp

prdp8dxp2#

创建外部表时需要传递jar
添加jar(目录)

相关问题