我的一个es节点失败,原因是 java.lang.OutOfMemoryError: Java heap space
错误。以下是日志中的完整堆栈跟踪:
[2020-09-18T04:25:04,215][WARN ][o.e.a.b.TransportShardBulkAction] [search1] [[my_index_4][0]] failed to perform indices:data/write/bulk[s] on replica [my_index_4][0], node[cm_76wfGRFm9nbPR1mJxTQ], [R], s[STARTED], a[id=BUpviwHxQK2qC3GrELC2Hw]
org.elasticsearch.transport.NodeDisconnectedException: [search3][X.X.X.179:9300][indices:data/write/bulk[s][r]] disconnected
[2020-09-18T04:25:04,215][WARN ][o.e.c.a.s.ShardStateAction] [search1] [my_index_4][0] received shard failed for shard id [[my_index_4][0]], allocation id [BUpviwHxQK2qC3GrELC2Hw], primary term [2], message [failed to perform indices:data/write/bulk[s] on replica [my_index_4][0], node[cm_76wfGRFm9nbPR1mJxTQ], [R], s[STARTED], a[id=BUpviwHxQK2qC3GrELC2Hw]], failure [NodeDisconnectedException[[search3][X.X.X.179:9300][indices:data/write/bulk[s][r]] disconnected]]
org.elasticsearch.transport.NodeDisconnectedException: [search3][X.X.X.179:9300][indices:data/write/bulk[s][r]] disconnected
[2020-09-18T04:25:04,215][DEBUG][o.e.a.a.c.n.i.TransportNodesInfoAction] [search1] failed to execute on node [cm_76wfGRFm9nbPR1mJxTQ]
org.elasticsearch.transport.NodeDisconnectedException: [search3][X.X.X.179:9300][cluster:monitor/nodes/info[n]] disconnected
[2020-09-18T04:25:04,219][INFO ][o.e.c.r.a.AllocationService] [search1] Cluster health status changed from [GREEN] to [YELLOW] (reason: [shards failed [[my_index_4][0]] ...]).
[2020-09-18T04:25:05,450][INFO ][o.e.m.j.JvmGcMonitorService] [search1] [gc][11099506] overhead, spent [605ms] collecting in the last [1.4s]
[2020-09-18T04:25:05,453][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [search1] fatal error in thread [elasticsearch[search1][search][T#5]], exiting
java.lang.OutOfMemoryError: Java heap space
at org.elasticsearch.search.aggregations.bucket.composite.CompositeValuesSource$GlobalOrdinalValuesSource.<init>(CompositeValuesSource.java:137) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.aggregations.bucket.composite.CompositeValuesSource.wrapGlobalOrdinals(CompositeValuesSource.java:123) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.aggregations.bucket.composite.CompositeValuesComparator.<init>(CompositeValuesComparator.java:50) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.aggregations.bucket.composite.CompositeAggregator.<init>(CompositeAggregator.java:69) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.aggregations.bucket.composite.CompositeAggregationFactory.createInternal(CompositeAggregationFactory.java:52) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.aggregations.AggregatorFactory.create(AggregatorFactory.java:216) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.aggregations.AggregatorFactories.createTopLevelAggregators(AggregatorFactories.java:216) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.aggregations.AggregationPhase.preProcess(AggregationPhase.java:55) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:105) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.indices.IndicesService.lambda$loadIntoContext$14(IndicesService.java:1133) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.indices.IndicesService$$Lambda$2241/341562582.accept(Unknown Source) ~[?:?]
at org.elasticsearch.indices.IndicesService.lambda$cacheShardLevelResult$15(IndicesService.java:1186) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.indices.IndicesService$$Lambda$2242/1286052129.get(Unknown Source) ~[?:?]
at org.elasticsearch.indices.IndicesRequestCache$Loader.load(IndicesRequestCache.java:160) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.indices.IndicesRequestCache$Loader.load(IndicesRequestCache.java:143) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.common.cache.Cache.computeIfAbsent(Cache.java:412) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.indices.IndicesRequestCache.getOrCompute(IndicesRequestCache.java:116) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.indices.IndicesService.cacheShardLevelResult(IndicesService.java:1192) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.indices.IndicesService.loadIntoContext(IndicesService.java:1132) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:305) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:340) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:316) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:312) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.SearchService$3.doRun(SearchService.java:1002) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:672) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.2.4.jar:6.2.4]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_171]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_171]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
由于上述例外,我得到 master_not_discovered_exception
当我点击任何一个esapi的时候。
问:有人能告诉我下一步我应该做什么来让elasticsearch恢复正常状态吗?有没有办法重新启动断开连接的节点?
2条答案
按热度按时间uhry853o1#
这个
java.lang.OutOfMemoryError: Java heap space
是由运行我为其设置size
参数到Integer.MAX_VALUE
:根据堆栈跟踪,初始化聚合值数组时出错
CompositeValuesSource.java:137
:这里,那个
size
参数来自查询。答案https://stackoverflow.com/a/63965634/5284890 确认根本原因。
我的下一步是使用以下命令停止并再次运行elasticsearcch
我的以下步骤将是检查建议断路器在es文章中提到的这个答案https://stackoverflow.com/a/63965634/5284890.
2mbi3lxu2#
首先让我简单解释一下是什么导致了这个问题:
如日志中所述,您似乎正在运行代价高昂的聚合,通常是内存密集型的,并且已知会消耗大量内存,而垃圾收集(gc)无法回收这些内存,最终您的应用程序耗尽内存并被杀死。
除了在日志中显示的昂贵的聚合之外,大量的搜索和索引请求也会导致高内存消耗,因此请查看此节点的搜索和索引慢速日志,有关详细信息,请参阅es慢速日志
现在是决议部分
此es节点已死亡,这导致
master_not_discovered_exception
因此,重新启动此节点并查看是否出现此异常非常重要。有关此异常的详细信息,请参阅opster的博客。防止oom异常
您应该正确配置es中可用的断路器,如果可能的话,可以升级到es 7.x,后者基于实际内存具有更好的断路器
提高es索引和搜索性能。