Paddle paddle-trt加载RetinaNet模型预测一定次数后OOM

wixjitnu  于 2022-10-20  发布在  其他
关注(0)|答案(4)|浏览(233)
  • 标题:paddle-trt加载RetinaNet模型预测一定次数后OOM
  • 版本、环境信息:

   1)PaddlePaddle版本:v1.8.5
   2)GPU:Tesla P4, TensorRT7.0, CUDA10, cuDNN7.6
   3)系统环境:Ubuntu16.04

  • 预测信息

   1)C++预测:自己编译的paddle-trt预测库

  • 复现信息:while(True)循环执行前向计算。
  • 问题描述:while(True)循环执行retinanet的前向计算,大约在150-200次循环之后开始出现OOM,打印的log如下所示:
I1013 05:55:22.874019   593 analysis_predictor.cc:138] Profiler is deactivated, and no profiling report will be generated.
I1013 05:55:22.893450   593 analysis_predictor.cc:875] MODEL VERSION: 1.7.1
I1013 05:55:22.893493   593 analysis_predictor.cc:877] PREDICTOR VERSION: 1.8.5
I1013 05:55:23.047775   593 analysis_predictor.cc:432] TensorRT subgraph engine is enabled
--- Running analysis [ir_graph_build_pass]
--- Running analysis [ir_graph_clean_pass]
--- Running analysis [ir_analysis_pass]
--- Running IR pass [conv_affine_channel_fuse_pass]
I1013 05:55:24.338455   593 graph_pattern_detector.cc:101] ---  detected 53 subgraphs
--- Running IR pass [conv_eltwiseadd_affine_channel_fuse_pass]
--- Running IR pass [shuffle_channel_detect_pass]
--- Running IR pass [quant_conv2d_dequant_fuse_pass]
--- Running IR pass [delete_quant_dequant_op_pass]
--- Running IR pass [simplify_with_basic_ops_pass]
--- Running IR pass [embedding_eltwise_layernorm_fuse_pass]
--- Running IR pass [multihead_matmul_fuse_pass_v2]
--- Running IR pass [skip_layernorm_fuse_pass]
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [fc_fuse_pass]
--- Running IR pass [tensorrt_subgraph_pass]
I1013 05:55:24.596479   593 tensorrt_subgraph_pass.cc:115] ---  detect a sub-graph with 269 nodes
W1013 05:55:24.621840   593 tensorrt_subgraph_pass.cc:285] The Paddle lib links the 7011 version TensorRT, make sure the runtime TensorRT you are using is no less than this version, otherwise, there might be Segfault!
I1013 05:55:24.621924   593 tensorrt_subgraph_pass.cc:321] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I1013 05:55:25.212077   593 engine.cc:176] Run Paddle-TRT Dynamic Shape mode.
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [conv_elementwise_add_act_fuse_pass]
I1013 05:56:20.910085   593 graph_pattern_detector.cc:101] ---  detected 16 subgraphs
--- Running IR pass [conv_elementwise_add2_act_fuse_pass]
--- Running IR pass [conv_elementwise_add_fuse_pass]
I1013 05:56:20.916911   593 graph_pattern_detector.cc:101] ---  detected 6 subgraphs
--- Running IR pass [transpose_flatten_concat_fuse_pass]
--- Running analysis [ir_params_sync_among_devices_pass]
I1013 05:56:20.934197   593 ir_params_sync_among_devices_pass.cc:41] Sync params from CPU to GPU
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [ir_graph_to_program_pass]
I1013 05:56:21.084280   593 analysis_predictor.cc:496] ======= optimize end =======
W1013 05:56:21.299019   593 device_context.cc:252] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.2, Runtime API Version: 10.0
W1013 05:56:21.299244   593 device_context.cc:260] device: 0, cuDNN Version: 7.6.

### 

### 正常预测约150-200次左右后出现以下log:

### 

Cuda error in file src/implicit_gemm.cu at line 585: out of memory
Cuda error in file src/winograd.cu at line 712: out of memory
Cuda error in file src/winograd.cu at line 712: out of memory
Cuda error in file src/winograd.cu at line 712: out of memory
Cuda error in file src/winograd.cu at line 712: out of memory
Cuda error in file src/winograd.cu at line 712: out of memory
Cuda error in file src/winograd.cu at line 712: out of memory
Cuda error in file src/winograd.cu at line 712: out of memory
E1013 06:04:18.647707   838 helper.h:76] ../rtSafe/cuda/caskConvolutionRunner.cpp (370) - Cuda Error in execute: 2 (out of memory)
E1013 06:04:18.647853   838 helper.h:76] FAILED_EXECUTION: std::exception

虽然出现OOM的报错,但是预测结果依然能正确计算,且从 nvidia-smi 看显存还有很多。
请问这个是什么问题,如何解决呢?

yacmzcpb

yacmzcpb1#

您好,可以将config.EnableTensorRtEngine的第一个参数max_workspace设为1 << 30试一下

bxfogqkk

bxfogqkk2#

@cryoco 你好,改成1<<30后,预测到第300次的时候还是出现了一样的问题。

5kgi1eie

5kgi1eie3#

有可能是因为TRT6和7.0存在一个使用动态shape时显存泄漏的bug。该bug在TRT7.1得到了修复,paddle-TRT从2.0-beta版本开始支持TRT7.1,需要cuda10.2,cudnn8.0,driver440以上

ki0zmccv

ki0zmccv4#

好的,我试试2.0-beta,谢谢。

相关问题