使用apache camel Saga 的springboot应用程序导致Narayana Long Running Actions(LRA)内存泄漏- java.lang.OutOfMemoryError:Java堆空间

23c0lvtd  于 2023-10-18  发布在  Apache
关注(0)|答案(2)|浏览(120)

将 Saga 任务从camel-spring-boot-examples/saga更改为成功执行多个 Saga 事务,会导致narayana-lra服务器失败并重新启动,原因是出现java.lang.OutOfMemoryError错误:Java堆空间。
使用camelorg.apache.camel.springboot.example版本3.20.0
pom.xml

<parent>
        <groupId>org.apache.camel.springboot</groupId>
        <artifactId>spring-boot</artifactId>
        <version>3.20.0</version>
    </parent>

    <groupId>org.apache.camel.springboot.example</groupId>
    <artifactId>examples</artifactId>
    <name>Camel SB :: Examples</name>
    <description>Camel Examples</description>
    <packaging>pom</packaging>
    <properties>
        <camel-version>3.20.0</camel-version>
        <skip.starting.camel.context>false</skip.starting.camel.context>
        <javax.servlet.api.version>4.0.1</javax.servlet.api.version>
        <jkube-maven-plugin-version>1.9.1</jkube-maven-plugin-version>
        <kafka-avro-serializer-version>5.2.2</kafka-avro-serializer-version>
        <reactor-version>3.2.16.RELEASE</reactor-version>
        <testcontainers-version>1.16.3</testcontainers-version>
        <hapi-structures-v24-version>2.3</hapi-structures-v24-version>
        <narayana-spring-boot-version>2.6.3</narayana-spring-boot-version>
    </properties>

changed docker-compose

version: "3.9"
services:
  lra-coordinator:
    image: "quay.io/jbosstm/lra-coordinator:7.0.0.Final-3.2.2.Final"
    network_mode: "host"
    deploy:
      resources:
        limits:
          memory: 400M
    environment:
      - 'JAVA_TOOL_OPTIONS=-Dquarkus.log.level=DEBUG 
        -Dcom.sun.management.jmxremote=true
        -Dcom.sun.management.jmxremote.port=7091
        -Dcom.sun.management.jmxremote.ssl=false 
        -Dcom.sun.management.jmxremote.authenticate=false'

  amq-broker:
    image: "registry.redhat.io/amq7/amq-broker-rhel8:7.10"
    environment:
      - AMQ_USER=admin
      - AMQ_PASSWORD=admin
      - AMQ_REQUIRE_LOGIN=true
    ports:
      - "8161:8161"
      - "61616:61616"

已更改sagaRoute

public class SagaRoute extends RouteBuilder {

    private static final String DIRECT_SAGA = "direct:saga";
    @Autowired
    private ProducerTemplate producerTemplate;

    private final ExecutorService executor = Executors.newFixedThreadPool(10);

    @Override
    public void configure() throws Exception {

        rest().get("/perf")
                .param().type(RestParamType.query).name("n").dataType("int").required(true).endParam()
                .to("direct:perf");

        from("direct:perf")
            .process(exchange -> {
                Integer limit = exchange.getMessage().getHeader("n", Integer.class);
                for (int i = 0; i < limit; i++) {
                    int finalI = i;
                    executor.submit(()-> producerTemplate.sendBodyAndHeader(DIRECT_SAGA, finalI,"id",finalI))
                    ;
                }
            });

        rest().post("/saga")
                .param().type(RestParamType.query).name("id").dataType("int").required(true).endParam()
                .to(DIRECT_SAGA);

        from(DIRECT_SAGA)
                .saga()
                .compensation("direct:cancelOrder")
                    .log("Executing saga #${header.id} with LRA ${header.Long-Running-Action}")
                    .setHeader("payFor", constant("train"))
                    .to("activemq:queue:{{example.services.train}}?exchangePattern=InOut" +
                            "&replyTo={{example.services.train}}.reply")
                    .log("train seat reserved for saga #${header.id} with payment transaction: ${body}")
                    .setHeader("payFor", constant("flight"))
                    .to("activemq:queue:{{example.services.flight}}?exchangePattern=InOut" +
                            "&replyTo={{example.services.flight}}.reply")
                    .log("flight booked for saga #${header.id} with payment transaction: ${body}")
                .setBody(header("Long-Running-Action"))
                .end();

        from("direct:cancelOrder")
                .log("Transaction ${header.Long-Running-Action} has been cancelled due to flight or train failure");

    }

}

更改付款方式https://github.com/apache/camel-spring-boot-examples/blob/camel-spring-boot-examples-3.20.0/saga/saga-payment-service/src/main/java/org/apache/camel/example/saga/PaymentRoute.java

public class PaymentRoute extends RouteBuilder {

    @Override
    public void configure() throws Exception {

                from("activemq:queue:{{example.services.payment}}")
                .routeId("payment-service")
                .saga()
                    .propagation(SagaPropagation.MANDATORY)
                    .option("id", header("id"))
                    .compensation("direct:cancelPayment")
                    .log("Paying ${header.payFor} for order #${header.id}")
                    .setBody(header("JMSCorrelationID"))
                    .log("Payment ${header.payFor} done for order #${header.id} with payment transaction ${body}")
                .end();

        from("direct:cancelPayment")
                .routeId("payment-cancel")
                .log("Payment for order #${header.id} has been cancelled");
    }
}

执行o run-local.sh以启动服务
执行下面的命令,输入要完成的事务数

curl http://localhost:8084/api/perf?n=100000

这样做我们得到的飞行记录器这个问题

The live set on the heap seems to increase with a speed of about 26,2 KiB per second during the recording.
An analysis of the reference tree found 1 leak candidates. The main candidate is java.util.concurrent.ConcurrentHashMap$Node[] 
Referenced by this chain:
java.util.concurrent.ConcurrentHashMap.table
io.narayana.lra.coordinator.domain.service.LRAService.participants
io.narayana.lra.coordinator.internal.LRARecoveryModule.service
java.lang.Object[]
java.util.Vector.elementData
com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery._recoveryModules

Flight Recorder Diagnose Image
以及Narayana lra上的错误

2023-08-15 19:16:35,438 DEBUG [org.jbo.res.rea.com.cor.AbstractResteasyReactiveContext] (executor-thread-59) Restarting handler chain for exception exception: java.lang.OutOfMemoryError: Java heap space

2023-08-15 19:16:37,729 DEBUG [org.jbo.res.rea.com.cor.AbstractResteasyReactiveContext] (executor-thread-59) Attempting to handle unrecoverable exception: java.lang.OutOfMemoryError: Java heap space

2023-08-15 19:16:38,327 DEBUG [io.ver.ext.web.RoutingContext] (executor-thread-59) RoutingContext failure (500): java.lang.OutOfMemoryError: Java heap space

保存导致内存泄漏的数据的属性为
io.narayana.lra.coordinator.domain.service.LRAService.participants

public class LRAService {
    private final Map<String, String> participants = new ConcurrentHashMap<>();

在LRAService类中,我找不到从这个Map中删除itens的位置。
这是一个错误吗?一个关于纳拉亚纳lra的错误配置?Apache Camel Saga 的窃听器
非常感谢

wlzqhblo

wlzqhblo1#

我在Narayana zulip聊天上收到了答案
https://narayana.zulipchat.com/#narrow/stream/323714-users/topic/apache.20camel.20saga.20causes.20Narayana.20-LRA.20memory.20leak/near/386195429
迈克尔·默斯格罗夫:很好地发现,当交易完成时,参与者应该在这里被删除(如果参与者移动,则更新)。
我们将为您提供一个问题跟踪器来监控修复。
迈克尔·默斯格罗夫:您可以使用问题https://issues.redhat.com/browse/JBTM-3795跟踪我们的进度
卡片内容
描述LRA模块维护参与者1的Map,当LRA完成时[2](如果参与者希望在不同的端点上得到通知),应该清理该Map。
1网址:http://github.com/jbosstm/narayana/blob/7.0.0.Final/rts/lra/coordinator/src/main/java/io/narayana/lra/coordinator/domain/service/LRAService.java#L46
[2]https://github.com/jbosstm/narayana/blob/7.0.0.Final/rts/lra/coordinator/src/main/java/io/narayana/lra/coordinator/domain/service/LRAService.java#L195

js5cn81o

js5cn81o2#

正如Zulip线程中所写的,这个问题与Netty直接访问内存不尊重Docker内存限制有关。为了修复OOM错误,您可以设置

JAVA_TOOL_OPTIONS='-Dio.netty.maxDirectMemory=0'

作为环境变量。您也可以将io.netty.maxDirectMemory限制为低于Docker容器内存限制的值(即-Dio.netty.maxDirectMemory=100m)。

相关问题