当我用多个对象运行create请求时,hadoop配置单元一直冻结

fnvucqvd  于 2021-06-02  发布在  Hadoop
关注(0)|答案(1)|浏览(211)

当我创建一些简单的表时,我的配置单元就工作了,但是当我尝试运行任何一个包含大量对象的创建表时,它会在提供以下内容之后立即冻结,

Query ID = root_20160321031616_6fbfd536-f3e5-4517-ab8b-2dc8ddb34b85

Total jobs = 3

Launching Job 1 out of 3

Number of reduce tasks is set to 0 since there's no reduce operator

Starting Job = job_1458530057671_0001, Tracking URL = http://sandbox.hortonworks.com:8088/proxy/application_1458530057671_0001/

Kill Command = /usr/hdp/2.2.0.0-2041/hadoop/bin/hadoop job  -kill job_1458530057671_0001

我不记得以前它工作时是否有“…没有reduce操作符”。
我尝试运行的代码相对简单,

create table BMO_F069_table as
select
    get_json_object(BMO_F069.json, '$.text') as text,
    get_json_object(BMO_F069.json, '$.in_reply_to_user_id') as in_reply_to_user_id,
    get_json_object(BMO_F069.json, '$.id') as id,
    get_json_object(BMO_F069.json, '$.favorite_count') as favorite_count,
    get_json_object(BMO_F069.json, '$.coordinates') as coordinates,
    get_json_object(BMO_F069.json, '$.id_str') as id_str,
    get_json_object(BMO_F069.json, '$.user.location') as location,
    get_json_object(BMO_F069.json, '$.lang') as lang,
    get_json_object(BMO_F069.json, '$.indices') as indices,
    get_json_object(BMO_F069.json, '$.type') as type,
    get_json_object(BMO_F069.json, '$.hashtags') as hashtags,
    get_json_object(BMO_F069.json, '$.user_mentions') as user_mentions,
    get_json_object(BMO_F069.json, '$.user.screen_name') as screen_name,
    get_json_object(BMO_F069.json, '$.user.name') as name,
    get_json_object(BMO_F069.json, '$.in_reply_to_screen_name') as in_reply_to_screen_name,
    get_json_object(BMO_F069.json, '$.retweet_count') as retweet_count,
    get_json_object(BMO_F069.json, '$.favorited') as favorited,
    get_json_object(BMO_F069.json, '$.retweeted_status') as retweeted_status,
    get_json_object(BMO_F069.json, '$.user') as user,
    get_json_object(BMO_F069.json, '$.followers_count') as followers_count,
    get_json_object(BMO_F069.json, '$.statuses_count') as statuses_count,
    get_json_object(BMO_F069.json, '$.description') as description,
    get_json_object(BMO_F069.json, '$.geo_enabled') as geo_enabled,
    get_json_object(BMO_F069.json, '$.favourites_count') as favourites_count,
    get_json_object(BMO_F069.json, '$.created_at') as created_at,
    get_json_object(BMO_F069.json, '$.time_zone') as time_zone,
    get_json_object(BMO_F069.json, '$.listed_count') as listed_count,
    get_json_object(BMO_F069.json, '$.in_reply_to_user_id_str') as in_reply_to_user_id_str
from BMO_F069;

数据由60 mb的数据组成。不幸的是,我对集群的了解还不够,无法给出具体的规格。对不起的。但我也很感激你的反馈。谢谢,在过去的几周里,我已经运行了数百次类似的查询,数据大到半TB,没有任何问题。当它在一个作业之间冻结时,它停止了任何新提交的工作。有没有办法重置它?
当我从终端运行hive时,我得到下面的开场白。这正常吗?我不记得以前的信息了。

16/03/21 21:16:55 WARN conf.HiveConf: HiveConf of name hive.optimize.mapjoin.mapreduce does not exist
16/03/21 21:16:55 WARN conf.HiveConf: HiveConf of name hive.heapsize does not exist
16/03/21 21:16:55 WARN conf.HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist
16/03/21 21:16:55 WARN conf.HiveConf: HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist

非常感谢您的帮助。

cuxqih21

cuxqih211#

当您启动一个超级未优化的作业时,hive仍然会尝试完成它的任务,无论它需要多长时间。
由于您没有提供任何关于集群规格、数据量和查询的有用信息,。。。我猜可能是您的查询编写得不好,或者您缺少集群资源来及时完成您的请求。

相关问题