我们已经将配置单元执行引擎从mapreduce切换到spark,并尝试使用 beeline
以及 jdbc
.
我们能够运行简单的查询(例如: select * from table
)因为它不需要处理数据,但是当我们尝试运行包含聚合函数的查询时(例如: select count(*) from table
)我们面临以下错误:
Query ID = hadoop_20180105123047_5bcd0d7a-78bd-4b66-b5fb-fc430726c2a9
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask
有什么问题吗?
1条答案
按热度按时间wmomyfyw1#
第一个查询工作的原因是它不需要运行任何mr或spark作业。hs2或hive客户端直接读取数据。第二个查询要求运行mr或spark作业。这是测试或排除集群故障时要记住的关键。
你能在Hive外运行spark作业吗?