标题：paddle-trt加载RetinaNet模型预测一定次数后OOM
版本、环境信息：

1）PaddlePaddle版本：v1.8.5
2）GPU：Tesla P4, TensorRT7.0, CUDA10, cuDNN7.6
3）系统环境：Ubuntu16.04

预测信息

1）C++预测：自己编译的paddle-trt预测库

复现信息：while(True)循环执行前向计算。
问题描述：while(True)循环执行retinanet的前向计算，大约在150-200次循环之后开始出现OOM，打印的log如下所示：

I1013 05:55:22.874019   593 analysis_predictor.cc:138] Profiler is deactivated, and no profiling report will be generated.
I1013 05:55:22.893450   593 analysis_predictor.cc:875] MODEL VERSION: 1.7.1
I1013 05:55:22.893493   593 analysis_predictor.cc:877] PREDICTOR VERSION: 1.8.5
I1013 05:55:23.047775   593 analysis_predictor.cc:432] TensorRT subgraph engine is enabled
--- Running analysis [ir_graph_build_pass]
--- Running analysis [ir_graph_clean_pass]
--- Running analysis [ir_analysis_pass]
--- Running IR pass [conv_affine_channel_fuse_pass]
I1013 05:55:24.338455   593 graph_pattern_detector.cc:101] ---  detected 53 subgraphs
--- Running IR pass [conv_eltwiseadd_affine_channel_fuse_pass]
--- Running IR pass [shuffle_channel_detect_pass]
--- Running IR pass [quant_conv2d_dequant_fuse_pass]
--- Running IR pass [delete_quant_dequant_op_pass]
--- Running IR pass [simplify_with_basic_ops_pass]
--- Running IR pass [embedding_eltwise_layernorm_fuse_pass]
--- Running IR pass [multihead_matmul_fuse_pass_v2]
--- Running IR pass [skip_layernorm_fuse_pass]
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [fc_fuse_pass]
--- Running IR pass [tensorrt_subgraph_pass]
I1013 05:55:24.596479   593 tensorrt_subgraph_pass.cc:115] ---  detect a sub-graph with 269 nodes
W1013 05:55:24.621840   593 tensorrt_subgraph_pass.cc:285] The Paddle lib links the 7011 version TensorRT, make sure the runtime TensorRT you are using is no less than this version, otherwise, there might be Segfault!
I1013 05:55:24.621924   593 tensorrt_subgraph_pass.cc:321] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I1013 05:55:25.212077   593 engine.cc:176] Run Paddle-TRT Dynamic Shape mode.
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [conv_elementwise_add_act_fuse_pass]
I1013 05:56:20.910085   593 graph_pattern_detector.cc:101] ---  detected 16 subgraphs
--- Running IR pass [conv_elementwise_add2_act_fuse_pass]
--- Running IR pass [conv_elementwise_add_fuse_pass]
I1013 05:56:20.916911   593 graph_pattern_detector.cc:101] ---  detected 6 subgraphs
--- Running IR pass [transpose_flatten_concat_fuse_pass]
--- Running analysis [ir_params_sync_among_devices_pass]
I1013 05:56:20.934197   593 ir_params_sync_among_devices_pass.cc:41] Sync params from CPU to GPU
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [ir_graph_to_program_pass]
I1013 05:56:21.084280   593 analysis_predictor.cc:496] ======= optimize end =======
W1013 05:56:21.299019   593 device_context.cc:252] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.2, Runtime API Version: 10.0
W1013 05:56:21.299244   593 device_context.cc:260] device: 0, cuDNN Version: 7.6.

### 

### 正常预测约150-200次左右后出现以下log：

### 

Cuda error in file src/implicit_gemm.cu at line 585: out of memory
Cuda error in file src/winograd.cu at line 712: out of memory
Cuda error in file src/winograd.cu at line 712: out of memory
Cuda error in file src/winograd.cu at line 712: out of memory
Cuda error in file src/winograd.cu at line 712: out of memory
Cuda error in file src/winograd.cu at line 712: out of memory
Cuda error in file src/winograd.cu at line 712: out of memory
Cuda error in file src/winograd.cu at line 712: out of memory
E1013 06:04:18.647707   838 helper.h:76] ../rtSafe/cuda/caskConvolutionRunner.cpp (370) - Cuda Error in execute: 2 (out of memory)
E1013 06:04:18.647853   838 helper.h:76] FAILED_EXECUTION: std::exception

虽然出现OOM的报错，但是预测结果依然能正确计算，且从 nvidia-smi 看显存还有很多。
请问这个是什么问题，如何解决呢？