[在tez/hive的oom上有一个最初的问题,但是在一些回答和评论之后,一个新的问题和新的知识是值得的。]
我有一个侧面视图很大的问题。它连接4个表,所有orc压缩。铲斗在同一列上。就像这样:
select
10 fields from t
, 80 fields from the lateral view
from
(
select
10 fields
from
e (800M rows, 7GB of data, 1 bucket)
LEFT JOIN m (1M rows, 20MB )
LEFT JOIN c (2k rows, <1MB)
LEFT JOIN contact (150M rows, 283GB, 4 buckets)
) t
LATERAL VIEW
json_tuple (80 fields) as lv
如果移除横向视图,查询就完成了。如果我加上lv,我总是得到:
ERROR : Status: Failed
ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1516602562532_3606_2_03, diagnostics=[Task failed, taskId=task_1516602562532_3606_2_03_000001, diagnostics=[TaskAttempt 0 failed, info=[Container container_e113_1516602562532_3606_01_000008 finished with diagnostics set to [Container failed, exitCode=255. Exception from container-launch.
Container id: container_e113_1516602562532_3606_01_000008
Exit code: 255
Stack trace: ExitCodeException exitCode=255:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:933)
at org.apache.hadoop.util.Shell.run(Shell.java:844)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1123)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 255
]], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
我试过很多方法:
更新所有tez.grouping.*设置。
在连接中也添加where条件 set hive.auto.convert.join.noconditionaltask = false;
确保不尝试进行Map联接
添加 distributed by
不同的列以防止可能的倾斜
设置mapred.map.tasks=100
我已经把所有的java选项或内存设置都用光了。
我需要保持横向视图,因为可能会使用一些字段对其进行过滤(即,我不能只做一些漂亮的字符串操作来输出类似csv的表)。
有没有办法使横向视图适合内存,或将其拆分到多个Map器中?这是tez ui视图:
hdp2.6,8个数据节点,32gb ram
暂无答案!
目前还没有任何答案,快来回答吧!