将 Saga 任务从camel-spring-boot-examples/saga更改为成功执行多个 Saga 事务,会导致narayana-lra服务器失败并重新启动,原因是出现java.lang.OutOfMemoryError错误:Java堆空间。
使用camelorg.apache.camel.springboot.example版本3.20.0
pom.xml
<parent>
<groupId>org.apache.camel.springboot</groupId>
<artifactId>spring-boot</artifactId>
<version>3.20.0</version>
</parent>
<groupId>org.apache.camel.springboot.example</groupId>
<artifactId>examples</artifactId>
<name>Camel SB :: Examples</name>
<description>Camel Examples</description>
<packaging>pom</packaging>
<properties>
<camel-version>3.20.0</camel-version>
<skip.starting.camel.context>false</skip.starting.camel.context>
<javax.servlet.api.version>4.0.1</javax.servlet.api.version>
<jkube-maven-plugin-version>1.9.1</jkube-maven-plugin-version>
<kafka-avro-serializer-version>5.2.2</kafka-avro-serializer-version>
<reactor-version>3.2.16.RELEASE</reactor-version>
<testcontainers-version>1.16.3</testcontainers-version>
<hapi-structures-v24-version>2.3</hapi-structures-v24-version>
<narayana-spring-boot-version>2.6.3</narayana-spring-boot-version>
</properties>
version: "3.9"
services:
lra-coordinator:
image: "quay.io/jbosstm/lra-coordinator:7.0.0.Final-3.2.2.Final"
network_mode: "host"
deploy:
resources:
limits:
memory: 400M
environment:
- 'JAVA_TOOL_OPTIONS=-Dquarkus.log.level=DEBUG
-Dcom.sun.management.jmxremote=true
-Dcom.sun.management.jmxremote.port=7091
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false'
amq-broker:
image: "registry.redhat.io/amq7/amq-broker-rhel8:7.10"
environment:
- AMQ_USER=admin
- AMQ_PASSWORD=admin
- AMQ_REQUIRE_LOGIN=true
ports:
- "8161:8161"
- "61616:61616"
已更改sagaRoute
public class SagaRoute extends RouteBuilder {
private static final String DIRECT_SAGA = "direct:saga";
@Autowired
private ProducerTemplate producerTemplate;
private final ExecutorService executor = Executors.newFixedThreadPool(10);
@Override
public void configure() throws Exception {
rest().get("/perf")
.param().type(RestParamType.query).name("n").dataType("int").required(true).endParam()
.to("direct:perf");
from("direct:perf")
.process(exchange -> {
Integer limit = exchange.getMessage().getHeader("n", Integer.class);
for (int i = 0; i < limit; i++) {
int finalI = i;
executor.submit(()-> producerTemplate.sendBodyAndHeader(DIRECT_SAGA, finalI,"id",finalI))
;
}
});
rest().post("/saga")
.param().type(RestParamType.query).name("id").dataType("int").required(true).endParam()
.to(DIRECT_SAGA);
from(DIRECT_SAGA)
.saga()
.compensation("direct:cancelOrder")
.log("Executing saga #${header.id} with LRA ${header.Long-Running-Action}")
.setHeader("payFor", constant("train"))
.to("activemq:queue:{{example.services.train}}?exchangePattern=InOut" +
"&replyTo={{example.services.train}}.reply")
.log("train seat reserved for saga #${header.id} with payment transaction: ${body}")
.setHeader("payFor", constant("flight"))
.to("activemq:queue:{{example.services.flight}}?exchangePattern=InOut" +
"&replyTo={{example.services.flight}}.reply")
.log("flight booked for saga #${header.id} with payment transaction: ${body}")
.setBody(header("Long-Running-Action"))
.end();
from("direct:cancelOrder")
.log("Transaction ${header.Long-Running-Action} has been cancelled due to flight or train failure");
}
}
更改付款方式https://github.com/apache/camel-spring-boot-examples/blob/camel-spring-boot-examples-3.20.0/saga/saga-payment-service/src/main/java/org/apache/camel/example/saga/PaymentRoute.java
public class PaymentRoute extends RouteBuilder {
@Override
public void configure() throws Exception {
from("activemq:queue:{{example.services.payment}}")
.routeId("payment-service")
.saga()
.propagation(SagaPropagation.MANDATORY)
.option("id", header("id"))
.compensation("direct:cancelPayment")
.log("Paying ${header.payFor} for order #${header.id}")
.setBody(header("JMSCorrelationID"))
.log("Payment ${header.payFor} done for order #${header.id} with payment transaction ${body}")
.end();
from("direct:cancelPayment")
.routeId("payment-cancel")
.log("Payment for order #${header.id} has been cancelled");
}
}
执行o run-local.sh以启动服务
执行下面的命令,输入要完成的事务数
curl http://localhost:8084/api/perf?n=100000
这样做我们得到的飞行记录器这个问题
The live set on the heap seems to increase with a speed of about 26,2 KiB per second during the recording.
An analysis of the reference tree found 1 leak candidates. The main candidate is java.util.concurrent.ConcurrentHashMap$Node[]
Referenced by this chain:
java.util.concurrent.ConcurrentHashMap.table
io.narayana.lra.coordinator.domain.service.LRAService.participants
io.narayana.lra.coordinator.internal.LRARecoveryModule.service
java.lang.Object[]
java.util.Vector.elementData
com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery._recoveryModules
Flight Recorder Diagnose Image
以及Narayana lra上的错误
2023-08-15 19:16:35,438 DEBUG [org.jbo.res.rea.com.cor.AbstractResteasyReactiveContext] (executor-thread-59) Restarting handler chain for exception exception: java.lang.OutOfMemoryError: Java heap space
2023-08-15 19:16:37,729 DEBUG [org.jbo.res.rea.com.cor.AbstractResteasyReactiveContext] (executor-thread-59) Attempting to handle unrecoverable exception: java.lang.OutOfMemoryError: Java heap space
2023-08-15 19:16:38,327 DEBUG [io.ver.ext.web.RoutingContext] (executor-thread-59) RoutingContext failure (500): java.lang.OutOfMemoryError: Java heap space
保存导致内存泄漏的数据的属性为
io.narayana.lra.coordinator.domain.service.LRAService.participants
public class LRAService {
private final Map<String, String> participants = new ConcurrentHashMap<>();
在LRAService类中,我找不到从这个Map中删除itens的位置。
这是一个错误吗?一个关于纳拉亚纳lra的错误配置?Apache Camel Saga 的窃听器
非常感谢
2条答案
按热度按时间wlzqhblo1#
我在Narayana zulip聊天上收到了答案
https://narayana.zulipchat.com/#narrow/stream/323714-users/topic/apache.20camel.20saga.20causes.20Narayana.20-LRA.20memory.20leak/near/386195429
迈克尔·默斯格罗夫:很好地发现,当交易完成时,参与者应该在这里被删除(如果参与者移动,则更新)。
我们将为您提供一个问题跟踪器来监控修复。
迈克尔·默斯格罗夫:您可以使用问题https://issues.redhat.com/browse/JBTM-3795跟踪我们的进度
卡片内容
描述LRA模块维护参与者1的Map,当LRA完成时[2](如果参与者希望在不同的端点上得到通知),应该清理该Map。
1网址:http://github.com/jbosstm/narayana/blob/7.0.0.Final/rts/lra/coordinator/src/main/java/io/narayana/lra/coordinator/domain/service/LRAService.java#L46
[2]https://github.com/jbosstm/narayana/blob/7.0.0.Final/rts/lra/coordinator/src/main/java/io/narayana/lra/coordinator/domain/service/LRAService.java#L195
js5cn81o2#
正如Zulip线程中所写的,这个问题与Netty直接访问内存不尊重Docker内存限制有关。为了修复OOM错误,您可以设置
作为环境变量。您也可以将
io.netty.maxDirectMemory
限制为低于Docker容器内存限制的值(即-Dio.netty.maxDirectMemory=100m
)。