Paddle PPyolo2+tensorRt在win10上部署c++预测,生成序列化模型问题

pod7payv  于 2021-11-30  发布在  Java
关注(0)|答案(31)|浏览(1874)

 1)PaddlePaddle版本:2.1
   2)GPU:2080super、CUDA10.2和CUDNN7
   4)win10
   1)C++预测:version.txt文件
GIT COMMIT ID: 4ccd9a0
WITH_MKL: ON
WITH_MKLDNN: ON
WITH_GPU: ON
CUDA version: 10.2
CUDNN version: v7.6
CXX compiler version: 19.16.27045.0
WITH_TENSORRT: ON
TensorRT version: v7
   4)预测库来源:官方下载

问题:在win10环境下,使用官方提供检测预测库,配置好tensorRt,使用cmake生成工程后,每次使用tensorRt运行程序都会重新生成序列化模型,在ubuntu环境下没有此问题,而且在windows环境下生成序列化模型速度很慢

kr98yfug

kr98yfug1#

您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档常见问题历史IssueAI社区来寻求解答。祝您生活愉快~

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the APIFAQGithub Issue and AI community to get the answer.Have a nice day!

t0ybt7op

t0ybt7op2#

您好,能否提供下生成序列化时的预测log

8ljdwjyq

8ljdwjyq3#

WARNING: Logging before InitGoogleLogging() is written to STDERR
I0602 15:00:13.355023 15880 analysis_predictor.cc:155] Profiler is deactivated, and no profiling report will be generated.
I0602 15:00:13.406883 15880 analysis_predictor.cc:522] TensorRT subgraph engine is enabled
e[1me[35m--- Running analysis [ir_graph_build_pass]e[0m
e[1me[35m--- Running analysis [ir_graph_clean_pass]e[0m
e[1me[35m--- Running analysis [ir_analysis_pass]e[0m
e[32m--- Running IR pass [conv_affine_channel_fuse_pass]e[0m
e[32m--- Running IR pass [adaptive_pool2d_convert_global_pass]e[0m
e[32m--- Running IR pass [conv_eltwiseadd_affine_channel_fuse_pass]e[0m
e[32m--- Running IR pass [shuffle_channel_detect_pass]e[0m
e[32m--- Running IR pass [quant_conv2d_dequant_fuse_pass]e[0m
e[32m--- Running IR pass [delete_quant_dequant_op_pass]e[0m
e[32m--- Running IR pass [delete_quant_dequant_filter_op_pass]e[0m
e[32m--- Running IR pass [simplify_with_basic_ops_pass]e[0m
e[32m--- Running IR pass [embedding_eltwise_layernorm_fuse_pass]e[0m
e[32m--- Running IR pass [multihead_matmul_fuse_pass_v2]e[0m
e[32m--- Running IR pass [multihead_matmul_fuse_pass_v3]e[0m
e[32m--- Running IR pass [skip_layernorm_fuse_pass]e[0m
e[32m--- Running IR pass [unsqueeze2_eltwise_fuse_pass]e[0m
e[32m--- Running IR pass [conv_bn_fuse_pass]e[0m
I0602 15:00:14.175864 15880 graph_pattern_detector.cc:101] --- detected 101 subgraphs
e[32m--- Running IR pass [squeeze2_matmul_fuse_pass]e[0m
e[32m--- Running IR pass [reshape2_matmul_fuse_pass]e[0m
e[32m--- Running IR pass [flatten2_matmul_fuse_pass]e[0m
e[32m--- Running IR pass [map_matmul_to_mul_pass]e[0m
e[32m--- Running IR pass [fc_fuse_pass]e[0m
e[32m--- Running IR pass [conv_elementwise_add_fuse_pass]e[0m
I0602 15:00:14.285533 15880 graph_pattern_detector.cc:101] --- detected 107 subgraphs
e[32m--- Running IR pass [tensorrt_subgraph_pass]e[0m
I0602 15:00:14.566781 15880 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 6 nodes
I0602 15:00:14.575757 15880 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:00:15.157200 15880 engine.cc:88] Run Paddle-TRT FP16 mode
I0602 15:00:30.641460 15880 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 6 nodes
I0602 15:00:30.642489 15880 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:00:30.643487 15880 engine.cc:88] Run Paddle-TRT FP16 mode
I0602 15:00:44.689169 15880 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 6 nodes
I0602 15:00:44.690166 15880 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:00:44.692162 15880 engine.cc:88] Run Paddle-TRT FP16 mode
I0602 15:00:58.561785 15880 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 6 nodes
I0602 15:00:58.563781 15880 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:00:58.564779 15880 engine.cc:88] Run Paddle-TRT FP16 mode
I0602 15:01:13.806190 15880 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 4 nodes
I0602 15:01:13.807195 15880 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:01:13.808144 15880 engine.cc:88] Run Paddle-TRT FP16 mode
I0602 15:01:18.347720 15880 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 110 nodes
I0602 15:01:18.362681 15880 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:01:18.384621 15880 engine.cc:88] Run Paddle-TRT FP16 mode
I0602 15:06:32.004756 15880 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 10 nodes
I0602 15:06:32.006747 15880 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:06:32.009743 15880 engine.cc:88] Run Paddle-TRT FP16 mode
I0602 15:06:49.696780 15880 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 10 nodes
I0602 15:06:49.698774 15880 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:06:49.701767 15880 engine.cc:88] Run Paddle-TRT FP16 mode
I0602 15:07:07.526752 15880 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 5 nodes
I0602 15:07:07.527715 15880 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:07:07.529747 15880 engine.cc:88] Run Paddle-TRT FP16 mode
I0602 15:07:27.537101 15880 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 7 nodes
I0602 15:07:27.538098 15880 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:07:27.540093 15880 engine.cc:88] Run Paddle-TRT FP16 mode
I0602 15:07:53.054272 15880 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 7 nodes
I0602 15:07:53.055269 15880 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:07:53.059258 15880 engine.cc:88] Run Paddle-TRT FP16 mode
I0602 15:08:15.018090 15880 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 4 nodes
I0602 15:08:15.019086 15880 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:08:15.019086 15880 engine.cc:88] Run Paddle-TRT FP16 mode
I0602 15:08:19.482378 15880 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 6 nodes
I0602 15:08:19.483381 15880 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:08:19.484373 15880 engine.cc:88] Run Paddle-TRT FP16 mode
I0602 15:08:33.088526 15880 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 5 nodes
I0602 15:08:33.089483 15880 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:08:33.090479 15880 engine.cc:88] Run Paddle-TRT FP16 mode
I0602 15:08:52.708403 15880 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 6 nodes
I0602 15:08:52.709401 15880 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:08:52.710398 15880 engine.cc:88] Run Paddle-TRT FP16 mode
I0602 15:09:06.350334 15880 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 4 nodes
I0602 15:09:06.351332 15880 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:09:06.352332 15880 engine.cc:88] Run Paddle-TRT FP16 mode
I0602 15:09:10.719928 15880 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 6 nodes
I0602 15:09:10.720925 15880 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:09:10.722921 15880 engine.cc:88] Run Paddle-TRT FP16 mode
I0602 15:09:25.621057 15880 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 5 nodes
I0602 15:09:25.622053 15880 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:09:25.623050 15880 engine.cc:88] Run Paddle-TRT FP16 mode
e[32m--- Running IR pass [conv_bn_fuse_pass]e[0m
e[32m--- Running IR pass [conv_elementwise_add_act_fuse_pass]e[0m
e[32m--- Running IR pass [conv_elementwise_add2_act_fuse_pass]e[0m
e[32m--- Running IR pass [transpose_flatten_concat_fuse_pass]e[0m
e[1me[35m--- Running analysis [ir_params_sync_among_devices_pass]e[0m
I0602 15:09:34.398572 15880 ir_params_sync_among_devices_pass.cc:45] Sync params from CPU to GPU
e[1me[35m--- Running analysis [adjust_cudnn_workspace_size_pass]e[0m
e[1me[35m--- Running analysis [inference_op_replace_pass]e[0m
e[1me[35m--- Running analysis [memory_optimize_pass]e[0m
I0602 15:09:34.463398 15880 memory_optimize_pass.cc:201] Cluster name : concat_4.tmp_0 size: 19660800
I0602 15:09:34.463398 15880 memory_optimize_pass.cc:201] Cluster name : nearest_interp_v2_1.tmp_0 size: 6553600
I0602 15:09:34.464413 15880 memory_optimize_pass.cc:201] Cluster name : im_shape size: 8
I0602 15:09:34.464413 15880 memory_optimize_pass.cc:201] Cluster name : transpose_1.tmp_0 size: 211200
I0602 15:09:34.466395 15880 memory_optimize_pass.cc:201] Cluster name : softplus_20.tmp_0 size: 3276800
I0602 15:09:34.467388 15880 memory_optimize_pass.cc:201] Cluster name : transpose_0.tmp_0 size: 52800
I0602 15:09:34.467388 15880 memory_optimize_pass.cc:201] Cluster name : tanh_20.tmp_0 size: 3276800
I0602 15:09:34.467388 15880 memory_optimize_pass.cc:201] Cluster name : yolo_box_1.tmp_0 size: 76800
I0602 15:09:34.467388 15880 memory_optimize_pass.cc:201] Cluster name : yolo_box_0.tmp_0 size: 19200
I0602 15:09:34.468385 15880 memory_optimize_pass.cc:201] Cluster name : scale_factor size: 8
I0602 15:09:34.468385 15880 memory_optimize_pass.cc:201] Cluster name : cast_0.tmp_0 size: 8
e[1me[35m--- Running analysis [ir_graph_to_program_pass]e[0m
I0602 15:09:34.637933 15880 analysis_predictor.cc:598] ======= optimize end =======
I0602 15:09:34.637933 15880 naive_executor.cc:107] --- skip [feed], feed -> scale_factor
I0602 15:09:34.638929 15880 naive_executor.cc:107] --- skip [feed], feed -> image
I0602 15:09:34.638929 15880 naive_executor.cc:107] --- skip [feed], feed -> im_shape
I0602 15:09:34.646908 15880 naive_executor.cc:107] --- skip [nearest_interp_v2_1.tmp_0], fetch -> fetch
I0602 15:09:34.646908 15880 naive_executor.cc:107] --- skip [concat_4.tmp_0], fetch -> fetch
W0602 15:09:34.656881 15880 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.1, Runtime API Version: 10.0
W0602 15:09:34.657894 15880 device_context.cc:372] device: 0, cuDNN Version: 7.6.
W0602 15:09:35.054816 15880 helper.h:74] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:09:35.060801 15880 helper.h:74] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:09:35.063792 15880 helper.h:74] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:09:35.066797 15880 helper.h:74] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:09:35.069777 15880 helper.h:74] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:09:35.072768 15880 helper.h:74] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:09:35.076758 15880 helper.h:74] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:09:35.086731 15880 helper.h:74] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:09:35.088726 15880 helper.h:74] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:09:35.090720 15880 helper.h:74] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:09:35.094712 15880 helper.h:74] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:09:35.096704 15880 helper.h:74] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:09:35.098709 15880 helper.h:74] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:09:35.102689 15880 helper.h:74] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:09:35.104683 15880 helper.h:74] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:09:35.106678 15880 helper.h:74] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:09:35.108673 15880 helper.h:74] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:09:35.111665 15880 helper.h:74] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
Inference: 64.061005 ms per batch image
class=2 confidence=0.2698 rect=[193 204 156 202]
Visualized output saved as output\126_1.jpg
Inference: 22.954700 ms per batch image
class=2 confidence=0.2224 rect=[193 204 184 231]
Visualized output saved as output\126_2.jpg
Inference: 21.290199 ms per batch image
class=0 confidence=0.2029 rect=[249 621 0 175]
Visualized output saved as output\126_5.jpg
Inference: 21.661400 ms per batch image
Visualized output saved as output\35_1.jpg
Inference: 21.788601 ms per batch image
Visualized output saved as output\35_2.jpg
Inference: 22.311899 ms per batch image
Visualized output saved as output\35_3.jpg
Inference: 21.007299 ms per batch image
Visualized output saved as output\35_4.jpg
Inference: 21.519300 ms per batch image
Visualized output saved as output\39_1.jpg
Inference: 22.053101 ms per batch image
class=0 confidence=0.2046 rect=[259 426 9 214]
Visualized output saved as output\39_2.jpg
Inference: 20.740301 ms per batch image
class=0 confidence=0.3153 rect=[228 275 5 191]
Visualized output saved as output\39_3.jpg

D:\projects\PaddleDetection\deploy\cpp\Release\main.exe (进程 1352)已退出,返回代码为: -1073740791。

7vux5j2d

7vux5j2d4#

请问您是使用的PaddleDetection套件的C++部署,还是直接调用的Paddle Inference的推理接口呢

wooyq4lh

wooyq4lh5#

直接调用Paddle inference 推理接口
这个是1.8版本的第二次运行,上面log是第一次生成时,下面是第二次生成时log报错:但是使用2.1版本会重复生成
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0602 15:14:36.549612 20144 analysis_predictor.cc:155] Profiler is deactivated, and no profiling report will be generated.
I0602 15:14:36.602471 20144 analysis_predictor.cc:522] TensorRT subgraph engine is enabled
e[1me[35m--- Running analysis [ir_graph_build_pass]e[0m
e[1me[35m--- Running analysis [ir_graph_clean_pass]e[0m
e[1me[35m--- Running analysis [ir_analysis_pass]e[0m
e[32m--- Running IR pass [conv_affine_channel_fuse_pass]e[0m
e[32m--- Running IR pass [adaptive_pool2d_convert_global_pass]e[0m
e[32m--- Running IR pass [conv_eltwiseadd_affine_channel_fuse_pass]e[0m
e[32m--- Running IR pass [shuffle_channel_detect_pass]e[0m
e[32m--- Running IR pass [quant_conv2d_dequant_fuse_pass]e[0m
e[32m--- Running IR pass [delete_quant_dequant_op_pass]e[0m
e[32m--- Running IR pass [delete_quant_dequant_filter_op_pass]e[0m
e[32m--- Running IR pass [simplify_with_basic_ops_pass]e[0m
e[32m--- Running IR pass [embedding_eltwise_layernorm_fuse_pass]e[0m
e[32m--- Running IR pass [multihead_matmul_fuse_pass_v2]e[0m
e[32m--- Running IR pass [multihead_matmul_fuse_pass_v3]e[0m
e[32m--- Running IR pass [skip_layernorm_fuse_pass]e[0m
e[32m--- Running IR pass [unsqueeze2_eltwise_fuse_pass]e[0m
e[32m--- Running IR pass [conv_bn_fuse_pass]e[0m
I0602 15:14:37.383383 20144 graph_pattern_detector.cc:101] --- detected 101 subgraphs
e[32m--- Running IR pass [squeeze2_matmul_fuse_pass]e[0m
e[32m--- Running IR pass [reshape2_matmul_fuse_pass]e[0m
e[32m--- Running IR pass [flatten2_matmul_fuse_pass]e[0m
e[32m--- Running IR pass [map_matmul_to_mul_pass]e[0m
e[32m--- Running IR pass [fc_fuse_pass]e[0m
e[32m--- Running IR pass [conv_elementwise_add_fuse_pass]e[0m
I0602 15:14:37.492090 20144 graph_pattern_detector.cc:101] --- detected 107 subgraphs
e[32m--- Running IR pass [tensorrt_subgraph_pass]e[0m
I0602 15:14:37.773383 20144 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 7 nodes
I0602 15:14:37.776376 20144 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:14:38.031678 20144 engine.cc:88] Run Paddle-TRT FP16 mode
I0602 15:15:00.974486 20144 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 5 nodes
I0602 15:15:00.975484 20144 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:15:00.977491 20144 engine.cc:88] Run Paddle-TRT FP16 mode
I0602 15:15:20.102705 20144 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 4 nodes
E0602 15:15:20.104661 20144 helper.h:78] C:\source\rtSafe\coreReadArchive.cpp (55) - Serialization Error in nvinfer1::rt::CoreReadArchive::verifyHeader: 0 (Length in header does not match remaining archive length)
E0602 15:15:20.107652 20144 helper.h:78] INVALID_STATE: Unknown exception
E0602 15:15:20.108650 20144 helper.h:78] INVALID_CONFIG: Deserialize the cuda engine failed.

sulc1iza

sulc1iza6#

使用了两个版本的paddle_inference.zip cuda10.2 与cuda10.0版本的c++推理预测库,其中10.0版本tensorrt的生成序列化后第二次启动工程,会报错如上log所示:E0602 15:15:20.104661 20144 helper.h:78] C:\source\rtSafe\coreReadArchive.cpp (55) - Serialization Error in nvinfer1::rt::CoreReadArchive::verifyHeader: 0 (Length in header does not match remaining archive length)
E0602 15:15:20.107652 20144 helper.h:78] INVALID_STATE: Unknown exception
E0602 15:15:20.108650 20144 helper.h:78] INVALID_CONFIG: Deserialize the cuda engine failed.

91zkwejq

91zkwejq7#

谢谢,用了这个方法,貌似已经解决了这个问题。

xmd2e60i

xmd2e60i8#

设置应该掉个顺序,DeletePass应该放在所有配置的最后面
config.EnableMemoryOptim();
config.pass_builder()->DeletePass("conv_bn_fuse_pass");

bfrts1fy

bfrts1fy9#

1.8版本的问题应该:https://github.com/PaddlePaddle/Paddle/blob/75efc0ace116ca2ad2ed3b5ac5d16bcec01e2004/paddle/fluid/inference/analysis/helper.h#L246这里,自己编译的话可以修改为std::ofstream outfile(trt_serialized_path, std::ios::binary)尝试

uubf1zoe

uubf1zoe10#

Hi,paddle-bot,
怎么这个问题,没有人回复的。

谢谢!

kg7wmglp

kg7wmglp11#

Hi,paddle-bot
我也有遇到同样的问题,现在急用与项目落地,希望Paddle的技术支持尽快解决。

谢谢!

piv4azn7

piv4azn712#

// set use dynamic shape
if (this->use_dynamic_shape_) {
// set DynamicShsape for image tensor
const std::vector min_input_shape = {1, 3, this->trt_min_shape_, this->trt_min_shape_};
const std::vector max_input_shape = {1, 3, this->trt_max_shape_, this->trt_max_shape_};
const std::vector opt_input_shape = {1, 3, this->trt_opt_shape_, this->trt_opt_shape_};
const std::map<std::string, std::vector> map_min_input_shape = {{"image", min_input_shape}};
const std::map<std::string, std::vector> map_max_input_shape = {{"image", max_input_shape}};
const std::map<std::string, std::vector> map_opt_input_shape = {{"image", opt_input_shape}};

config.SetTRTDynamicShapeInfo(map_min_input_shape,
                              map_max_input_shape,
                              map_opt_input_shape);
std::cout << "TensorRT dynamic shape enabled" << std::endl;

}
}
} else {
config.DisableGpu();
if (this->use_mkldnn_) {
config.EnableMKLDNN();
// cache 10 different shapes for mkldnn to avoid memory leak
config.SetMkldnnCacheCapacity(10);
}
config.SetCpuMathLibraryNumThreads(this->cpu_math_library_num_threads_);
}
config.SwitchUseFeedFetchOps(false);
config.SwitchIrOptim(true);
// config.DisableGlogInfo();
// Memory optimization
config.pass_builder()->DeletePass("conv_bn_fuse_pass");
config.EnableMemoryOptim();
predictor_ = std::move(CreatePredictor(config));
}

chhqkbe1

chhqkbe113#

Are you satisfied with the resolution of your issue?

YES
No

4dc9hkyq

4dc9hkyq14#

void ObjectDetector::LoadModel(const std::string& model_dir,
const int batch_size,
const std::string& run_mode) {
paddle_infer::Config config;
std::string prog_file = model_dir + OS_PATH_SEP + "model.pdmodel";
std::string params_file = model_dir + OS_PATH_SEP + "model.pdiparams";
config.SetModel(prog_file, params_file);
if (this->use_gpu_) {
config.EnableUseGpu(200, this->gpu_id_);
config.SwitchIrOptim(true);
// use tensorrt
if (run_mode != "fluid") {
auto precision = paddle_infer::Config::Precision::kFloat32;
if (run_mode == "trt_fp32") {
precision = paddle_infer::Config::Precision::kFloat32;
}
else if (run_mode == "trt_fp16") {
precision = paddle_infer::Config::Precision::kHalf;
}
else if (run_mode == "trt_int8") {
precision = paddle_infer::Config::Precision::kInt8;
} else {
printf("run_mode should be 'fluid', 'trt_fp32', 'trt_fp16' or 'trt_int8'");
}
// set tensorrt
config.EnableTensorRtEngine(
1 << 30,
batch_size,
this->min_subgraph_size_,
precision,
true,
this->trt_calib_mode_);

// set use dynamic shape
  if (this->use_dynamic_shape_) {
    // set DynamicShsape for image tensor
    const std::vector<int> min_input_shape = {1, 3, this->trt_min_shape_, this->trt_min_shape_};
    const std::vector<int> max_input_shape = {1, 3, this->trt_max_shape_, this->trt_max_shape_};
    const std::vector<int> opt_input_shape = {1, 3, this->trt_opt_shape_, this->trt_opt_shape_};
    const std::map<std::string, std::vector<int>> map_min_input_shape = {{"image", min_input_shape}};
    const std::map<std::string, std::vector<int>> map_max_input_shape = {{"image", max_input_shape}};
    const std::map<std::string, std::vector<int>> map_opt_input_shape = {{"image", opt_input_shape}};

    config.SetTRTDynamicShapeInfo(map_min_input_shape,
                                  map_max_input_shape,
                                  map_opt_input_shape);
    std::cout << "TensorRT dynamic shape enabled" << std::endl;
  }
}

} else {
config.DisableGpu();
if (this->use_mkldnn_) {
config.EnableMKLDNN();
// cache 10 different shapes for mkldnn to avoid memory leak
config.SetMkldnnCacheCapacity(10);
}
config.SetCpuMathLibraryNumThreads(this->cpu_math_library_num_threads_);
}
config.SwitchUseFeedFetchOps(false);
config.SwitchIrOptim(true);
// config.DisableGlogInfo();
// Memory optimization
config.pass_builder()->DeletePass("conv_bn_fuse_pass");
config.EnableMemoryOptim();
predictor_ = std::move(CreatePredictor(config));
}

6za6bjd0

6za6bjd015#

看起来这句话没有生效,您能否发下config部分的代码看下

相关问题