我试图加入两个兽人表在Hive,但我得到一个错误。以下是查询:
select t1.num as num, t1.product as Product, t2.value as OldValue, t1.value as NewValue from test_new t1 LEFT OUTER JOIN test_old t2 ON t1.num=t2.num and t1.product=t2.product where t2.value is NULL and t1.value is not NULL or t1.value<>t2.value;
错误:
2017-05-29 11:19:27,157 INFO [main]: mr.ExecDriver (SessionState.java:printInfo(911)) - Execution log at: /tmp/alex/kaliamoorthya_20170529111919_6621dd64-7a5e-4411-abda-b28fddab8bdc.log
2017-05-29 11:19:27,320 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(118)) - <PERFLOG method=deserializePlan from=org.apache.hadoop.hive.ql.exec.Utilities>
2017-05-29 11:19:27,321 INFO [main]: exec.Utilities (Utilities.java:deserializePlan(953)) - Deserializing MapredLocalWork via kryo
2017-05-29 11:19:27,462 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(158)) - </PERFLOG method=deserializePlan start=1496056767320 end=1496056767462 duration=142 from=org.apache.hadoop.hive.ql.exec.Utilities>
2017-05-29 11:19:27,472 INFO [main]: mr.MapredLocalTask (SessionState.java:printInfo(911)) - 2017-05-29 11:19:27 Starting to launch local task to process map join; maximum memory = 1908932608
2017-05-29 11:19:27,549 INFO [main]: mr.MapredLocalTask (MapredLocalTask.java:initializeOperators(441)) - fetchoperator for t2 created
2017-05-29 11:19:27,550 INFO [main]: exec.TableScanOperator (Operator.java:initialize(346)) - Initializing Self TS[0]
2017-05-29 11:19:27,550 INFO [main]: exec.TableScanOperator (Operator.java:initializeChildren(419)) - Operator 0 TS initialized
2017-05-29 11:19:27,550 INFO [main]: exec.TableScanOperator (Operator.java:initializeChildren(423)) - Initializing children of 0 TS
2017-05-29 11:19:27,550 INFO [main]: exec.HashTableSinkOperator (Operator.java:initialize(458)) - Initializing child 1 HASHTABLESINK
2017-05-29 11:19:27,550 INFO [main]: exec.HashTableSinkOperator (Operator.java:initialize(346)) - Initializing Self HASHTABLESINK[1]
2017-05-29 11:19:27,551 INFO [main]: mapjoin.MapJoinMemoryExhaustionHandler (MapJoinMemoryExhaustionHandler.java:<init>(61)) - JVM Max Heap Size: 1908932608
2017-05-29 11:19:27,582 INFO [main]: persistence.HashMapWrapper (HashMapWrapper.java:calculateTableSize(94)) - Key count from statistics is -1; setting map size to 100000
2017-05-29 11:19:27,582 INFO [main]: exec.HashTableSinkOperator (Operator.java:initialize(394)) - Initialization Done 1 HASHTABLESINK
2017-05-29 11:19:27,582 INFO [main]: exec.TableScanOperator (Operator.java:initialize(394)) - Initialization Done 0 TS
2017-05-29 11:19:27,582 INFO [main]: mr.MapredLocalTask (MapredLocalTask.java:initializeOperators(461)) - fetchoperator for t2 initialized
2017-05-29 11:19:28,059 INFO [main]: Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1174)) - mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
2017-05-29 11:19:28,062 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(118)) - <PERFLOG method=OrcGetSplits from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl>
2017-05-29 11:19:28,098 INFO [main]: orc.OrcInputFormat (OrcInputFormat.java:generateSplitsInfo(961)) - FooterCacheHitRatio: 0/4
2017-05-29 11:19:28,098 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(158)) - </PERFLOG method=OrcGetSplits start=1496056768062 end=1496056768098 duration=36 from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl>
2017-05-29 11:19:28,209 INFO [main]: orc.OrcRawRecordMerger (OrcRawRecordMerger.java:<init>(430)) - min key = null, max key = null
2017-05-29 11:19:28,209 INFO [main]: orc.ReaderImpl (ReaderImpl.java:rowsOptions(526)) - Reading ORC rows from hdfs://nameservice1/user/hive/warehouse/alex_tmp.db/test_old/000000_0 with {include: [true, true, true, true], offset: 0, length: 9223372036854775807}
2017-05-29 11:19:28,646 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:28 Processing rows: 200000 Hashtable size: 199999 Memory usage: 130784248 percentage: 0.069
2017-05-29 11:19:28,708 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:28 Processing rows: 300000 Hashtable size: 299999 Memory usage: 159462144 percentage: 0.084
2017-05-29 11:19:28,784 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:28 Processing rows: 400000 Hashtable size: 399999 Memory usage: 207258624 percentage: 0.109
2017-05-29 11:19:28,843 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:28 Processing rows: 500000 Hashtable size: 499999 Memory usage: 235936520 percentage: 0.124
2017-05-29 11:19:28,903 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:28 Processing rows: 600000 Hashtable size: 599999 Memory usage: 274173712 percentage: 0.144
2017-05-29 11:19:28,965 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:28 Processing rows: 700000 Hashtable size: 699999 Memory usage: 312410896 percentage: 0.164
2017-05-29 11:19:29,059 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:29 Processing rows: 800000 Hashtable size: 799999 Memory usage: 359036720 percentage: 0.188
2017-05-29 11:19:29,126 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:29 Processing rows: 900000 Hashtable size: 899999 Memory usage: 397273912 percentage: 0.208
2017-05-29 11:19:29,196 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:29 Processing rows: 1000000 Hashtable size: 999999 Memory usage: 425951800 percentage: 0.223
2017-05-29 11:19:29,263 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:29 Processing rows: 1100000 Hashtable size: 1099999 Memory usage: 464188992 percentage: 0.243
2017-05-29 11:19:29,333 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:29 Processing rows: 1200000 Hashtable size: 1199999 Memory usage: 502426176 percentage: 0.263
2017-05-29 11:19:29,401 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:29 Processing rows: 1300000 Hashtable size: 1299999 Memory usage: 540663360 percentage: 0.283
2017-05-29 11:19:32,752 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:32 Processing rows: 1400000 Hashtable size: 1399999 Memory usage: 485809696 percentage: 0.254
2017-05-29 11:19:32,817 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:32 Processing rows: 1500000 Hashtable size: 1499999 Memory usage: 524582216 percentage: 0.275
2017-05-29 11:19:32,937 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:32 Processing rows: 1600000 Hashtable size: 1599999 Memory usage: 580131976 percentage: 0.304
2017-05-29 11:19:32,998 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:32 Processing rows: 1700000 Hashtable size: 1699999 Memory usage: 618904496 percentage: 0.324
2017-05-29 11:19:33,061 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:33 Processing rows: 1800000 Hashtable size: 1799999 Memory usage: 647983888 percentage: 0.339
2017-05-29 11:19:33,124 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:33 Processing rows: 1900000 Hashtable size: 1899999 Memory usage: 686756400 percentage: 0.36
2017-05-29 11:19:33,188 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:33 Processing rows: 2000000 Hashtable size: 1999999 Memory usage: 725528920 percentage: 0.38
2017-05-29 11:19:33,253 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:33 Processing rows: 2100000 Hashtable size: 2099999 Memory usage: 764301440 percentage: 0.40
2017-05-29 11:19:33,316 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:33 Processing rows: 2200000 Hashtable size: 2199999 Memory usage: 793380824 percentage: 0.416
2017-05-29 11:19:33,380 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:33 Processing rows: 2300000 Hashtable size: 2299999 Memory usage: 832153336 percentage: 0.436
2017-05-29 11:19:33,445 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:33 Processing rows: 2400000 Hashtable size: 2399999 Memory usage: 870925856 percentage: 0.456
2017-05-29 11:19:33,510 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:33 Processing rows: 2500000 Hashtable size: 2499999 Memory usage: 909698376 percentage: 0.477
2017-05-29 11:19:33,574 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:33 Processing rows: 2600000 Hashtable size: 2599999 Memory usage: 938777776 percentage: 0.492
2017-05-29 11:19:38,930 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:38 Processing rows: 2700000 Hashtable size: 2699999 Memory usage: 924140056 percentage: 0.484
2017-05-29 11:19:38,996 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:38 Processing rows: 2800000 Hashtable size: 2799999 Memory usage: 960610440 percentage: 0.503
2017-05-29 11:19:39,063 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 2900000 Hashtable size: 2899999 Memory usage: 997080808 percentage: 0.522
2017-05-29 11:19:39,134 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 3000000 Hashtable size: 2999999 Memory usage: 1033551200 percentage: 0.541
2017-05-29 11:19:39,203 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 3100000 Hashtable size: 3099999 Memory usage: 1070021576 percentage: 0.561
2017-05-29 11:19:39,392 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 3200000 Hashtable size: 3199999 Memory usage: 1140046400 percentage: 0.597
2017-05-29 11:19:39,456 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 3300000 Hashtable size: 3299999 Memory usage: 1176516784 percentage: 0.616
2017-05-29 11:19:39,519 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 3400000 Hashtable size: 3399999 Memory usage: 1212987168 percentage: 0.635
2017-05-29 11:19:39,583 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 3500000 Hashtable size: 3499999 Memory usage: 1249457552 percentage: 0.655
2017-05-29 11:19:39,646 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 3600000 Hashtable size: 3599999 Memory usage: 1285927936 percentage: 0.674
2017-05-29 11:19:39,710 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 3700000 Hashtable size: 3699999 Memory usage: 1322398320 percentage: 0.693
2017-05-29 11:19:39,774 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 3800000 Hashtable size: 3799999 Memory usage: 1358868704 percentage: 0.712
2017-05-29 11:19:39,837 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 3900000 Hashtable size: 3899999 Memory usage: 1395339088 percentage: 0.731
2017-05-29 11:19:39,904 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 4000000 Hashtable size: 3999999 Memory usage: 1431809456 percentage: 0.75
2017-05-29 11:19:39,973 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 4100000 Hashtable size: 4099999 Memory usage: 1468279832 percentage: 0.769
2017-05-29 11:19:40,041 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:40 Processing rows: 4200000 Hashtable size: 4199999 Memory usage: 1504750200 percentage: 0.788
2017-05-29 11:19:40,113 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:40 Processing rows: 4300000 Hashtable size: 4299999 Memory usage: 1538933512 percentage: 0.806
2017-05-29 11:19:48,786 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:48 Processing rows: 4400000 Hashtable size: 4399999 Memory usage: 1496365384 percentage: 0.784
2017-05-29 11:19:48,850 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:48 Processing rows: 4500000 Hashtable size: 4499999 Memory usage: 1532580448 percentage: 0.803
2017-05-29 11:19:48,915 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:48 Processing rows: 4600000 Hashtable size: 4599999 Memory usage: 1568795512 percentage: 0.822
2017-05-29 11:19:48,979 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:48 Processing rows: 4700000 Hashtable size: 4699999 Memory usage: 1605010584 percentage: 0.841
2017-05-29 11:19:49,044 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:49 Processing rows: 4800000 Hashtable size: 4799999 Memory usage: 1641225648 percentage: 0.86
2017-05-29 11:19:49,108 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:49 Processing rows: 4900000 Hashtable size: 4899999 Memory usage: 1677440712 percentage: 0.879
2017-05-29 11:19:49,171 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:49 Processing rows: 5000000 Hashtable size: 4999999 Memory usage: 1713655784 percentage: 0.898
2017-05-29 11:19:49,235 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:49 Processing rows: 5100000 Hashtable size: 5099999 Memory usage: 1749870856 percentage: 0.917
2017-05-29 11:19:49,246 ERROR [main]: mr.MapredLocalTask (MapredLocalTask.java:executeInProcess(354)) - Hive Runtime Error: Map local work exhausted memory
org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionException: 2017-05-29 11:19:49 Processing rows: 5100000 Hashtable size: 5099999 Memory usage: 1749870856 percentage: 0.917
at org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionHandler.checkMemoryStatus(MapJoinMemoryExhaustionHandler.java:99)
at org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.processOp(HashTableSinkOperator.java:249)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:409)
at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:380)
at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeInProcess(MapredLocalTask.java:346)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:743)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
我试图设置Map内存,并减少内存到22000也仍然没有运气。在网上搜索后,我发现有人建议 hive.auto.convert.join = false
属性来克服上述错误,我的查询开始运行。
我不确定以这种方式运行查询是否会获得任何性能。表演还会一样吗?我们还有别的办法来解决这个问题吗?请给我一些关于提高查询性能的建议。
1条答案
按热度按时间lb3vh1jj1#
第一个也是最安全的选项是将hive.auto.convert.join设置为false。这样会降低一些性能,因为您将无法从mapjoin中获益。但这完全取决于你的用例和你的数据大小,这个妥协有多大。另一个选项是使用hive.auto.convert.join.noconditionaltask.size选项,该选项根据https://cwiki.apache.org/confluence/display/hive/languagemanual+joinoptimization “允许用户控制表在内存中的大小”找到合适的阈值可能是一个挑战。
p、 请记住,要使hive.auto.convert.join.noconditionaltask.size生效,hive.auto.convert.join.noconditionaltask需要为true(默认情况下为true)。