- 标题:paddle-trt加载RetinaNet模型预测一定次数后OOM
- 版本、环境信息:
1)PaddlePaddle版本:v1.8.5
2)GPU:Tesla P4, TensorRT7.0, CUDA10, cuDNN7.6
3)系统环境:Ubuntu16.04
- 预测信息
1)C++预测:自己编译的paddle-trt预测库
- 复现信息:while(True)循环执行前向计算。
- 问题描述:while(True)循环执行retinanet的前向计算,大约在150-200次循环之后开始出现OOM,打印的log如下所示:
I1013 05:55:22.874019 593 analysis_predictor.cc:138] Profiler is deactivated, and no profiling report will be generated.
I1013 05:55:22.893450 593 analysis_predictor.cc:875] MODEL VERSION: 1.7.1
I1013 05:55:22.893493 593 analysis_predictor.cc:877] PREDICTOR VERSION: 1.8.5
I1013 05:55:23.047775 593 analysis_predictor.cc:432] TensorRT subgraph engine is enabled
--- Running analysis [ir_graph_build_pass]
--- Running analysis [ir_graph_clean_pass]
--- Running analysis [ir_analysis_pass]
--- Running IR pass [conv_affine_channel_fuse_pass]
I1013 05:55:24.338455 593 graph_pattern_detector.cc:101] --- detected 53 subgraphs
--- Running IR pass [conv_eltwiseadd_affine_channel_fuse_pass]
--- Running IR pass [shuffle_channel_detect_pass]
--- Running IR pass [quant_conv2d_dequant_fuse_pass]
--- Running IR pass [delete_quant_dequant_op_pass]
--- Running IR pass [simplify_with_basic_ops_pass]
--- Running IR pass [embedding_eltwise_layernorm_fuse_pass]
--- Running IR pass [multihead_matmul_fuse_pass_v2]
--- Running IR pass [skip_layernorm_fuse_pass]
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [fc_fuse_pass]
--- Running IR pass [tensorrt_subgraph_pass]
I1013 05:55:24.596479 593 tensorrt_subgraph_pass.cc:115] --- detect a sub-graph with 269 nodes
W1013 05:55:24.621840 593 tensorrt_subgraph_pass.cc:285] The Paddle lib links the 7011 version TensorRT, make sure the runtime TensorRT you are using is no less than this version, otherwise, there might be Segfault!
I1013 05:55:24.621924 593 tensorrt_subgraph_pass.cc:321] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I1013 05:55:25.212077 593 engine.cc:176] Run Paddle-TRT Dynamic Shape mode.
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [conv_elementwise_add_act_fuse_pass]
I1013 05:56:20.910085 593 graph_pattern_detector.cc:101] --- detected 16 subgraphs
--- Running IR pass [conv_elementwise_add2_act_fuse_pass]
--- Running IR pass [conv_elementwise_add_fuse_pass]
I1013 05:56:20.916911 593 graph_pattern_detector.cc:101] --- detected 6 subgraphs
--- Running IR pass [transpose_flatten_concat_fuse_pass]
--- Running analysis [ir_params_sync_among_devices_pass]
I1013 05:56:20.934197 593 ir_params_sync_among_devices_pass.cc:41] Sync params from CPU to GPU
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [ir_graph_to_program_pass]
I1013 05:56:21.084280 593 analysis_predictor.cc:496] ======= optimize end =======
W1013 05:56:21.299019 593 device_context.cc:252] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.2, Runtime API Version: 10.0
W1013 05:56:21.299244 593 device_context.cc:260] device: 0, cuDNN Version: 7.6.
###
### 正常预测约150-200次左右后出现以下log:
###
Cuda error in file src/implicit_gemm.cu at line 585: out of memory
Cuda error in file src/winograd.cu at line 712: out of memory
Cuda error in file src/winograd.cu at line 712: out of memory
Cuda error in file src/winograd.cu at line 712: out of memory
Cuda error in file src/winograd.cu at line 712: out of memory
Cuda error in file src/winograd.cu at line 712: out of memory
Cuda error in file src/winograd.cu at line 712: out of memory
Cuda error in file src/winograd.cu at line 712: out of memory
E1013 06:04:18.647707 838 helper.h:76] ../rtSafe/cuda/caskConvolutionRunner.cpp (370) - Cuda Error in execute: 2 (out of memory)
E1013 06:04:18.647853 838 helper.h:76] FAILED_EXECUTION: std::exception
虽然出现OOM的报错,但是预测结果依然能正确计算,且从 nvidia-smi
看显存还有很多。
请问这个是什么问题,如何解决呢?
4条答案
按热度按时间yacmzcpb1#
您好,可以将config.EnableTensorRtEngine的第一个参数max_workspace设为1 << 30试一下
bxfogqkk2#
@cryoco 你好,改成1<<30后,预测到第300次的时候还是出现了一样的问题。
5kgi1eie3#
有可能是因为TRT6和7.0存在一个使用动态shape时显存泄漏的bug。该bug在TRT7.1得到了修复,paddle-TRT从2.0-beta版本开始支持TRT7.1,需要cuda10.2,cudnn8.0,driver440以上
ki0zmccv4#
好的,我试试2.0-beta,谢谢。