1)PaddlePaddle版本:2.1
2)GPU:2080super、CUDA10.2和CUDNN7
4)win10
1)C++预测:version.txt文件
GIT COMMIT ID: 4ccd9a0
WITH_MKL: ON
WITH_MKLDNN: ON
WITH_GPU: ON
CUDA version: 10.2
CUDNN version: v7.6
CXX compiler version: 19.16.27045.0
WITH_TENSORRT: ON
TensorRT version: v7
4)预测库来源:官方下载
问题:在win10环境下,使用官方提供检测预测库,配置好tensorRt,使用cmake生成工程后,每次使用tensorRt运行程序都会重新生成序列化模型,在ubuntu环境下没有此问题,而且在windows环境下生成序列化模型速度很慢
31条答案
按热度按时间sqougxex16#
那从日志看,是正常加载了序列化文件的,没有重新生成啊
kr98yfug17#
您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档、常见问题、历史Issue、AI社区来寻求解答。祝您生活愉快~
Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the API,FAQ,Github Issue and AI community to get the answer.Have a nice day!
wtlkbnrh18#
这些日志都是加载序列化文件的日志呢,请问这不是windows下的日志么
mmvthczy19#
同出现这个bug,ubuntu下static=true重复加载序列化文件正常,但是win10环境下static=true,第一次运行编译好的程序时生成序列化文件正常,第二次运行不会加载上次生成的序列化文件而是重新生成序列化文件。
yebdmbv420#
但是,每次运行程序,都会生成tensorRt序列化文件,经过多次运行后序列化文件不断的增多。理论上生成序列化文件之前,应该检查本地是否有序列化文件,而不应该是每次都生成,而且每次生成不同的序列化文件名称多不相同,请问您这边有解决方案么?
moiiocjp21#
开启序列化,只需要static=true这一个配置,序列化文件多应该是模型被分割成了多个trt子图。
fcy6dtqo22#
程序中修改了 保存tensorrt序列化模型static=true,是不是还有其他的开关需要开启,程序重新运行连三次后生成了50多个序列化文件
dtcbnfnu23#
从日志看没有重新生成序列化文件,是从序列化文件直接加载的
ttcibm8c24#
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0602 15:32:54.725275 11048 analysis_config.cc:424] use_dlnne_:0
I0602 15:32:54.725275 11048 analysis_config.cc:424] use_dlnne_:0
I0602 15:32:54.725275 11048 analysis_config.cc:424] use_dlnne_:0
I0602 15:32:54.725275 11048 analysis_config.cc:424] use_dlnne_:0
I0602 15:32:56.657642 11048 analysis_config.cc:424] use_dlnne_:0
I0602 15:32:56.658639 11048 analysis_predictor.cc:155] Profiler is deactivated, and no profiling report will be generated.
I0602 15:32:56.703519 11048 analysis_predictor.cc:508] TensorRT subgraph engine is enabled
e[1me[35m--- Running analysis [ir_graph_build_pass]e[0m
e[1me[35m--- Running analysis [ir_graph_clean_pass]e[0m
e[1me[35m--- Running analysis [ir_analysis_pass]e[0m
e[32m--- Running IR pass [conv_affine_channel_fuse_pass]e[0m
e[32m--- Running IR pass [adaptive_pool2d_convert_global_pass]e[0m
e[32m--- Running IR pass [conv_eltwiseadd_affine_channel_fuse_pass]e[0m
e[32m--- Running IR pass [shuffle_channel_detect_pass]e[0m
e[32m--- Running IR pass [quant_conv2d_dequant_fuse_pass]e[0m
e[32m--- Running IR pass [delete_quant_dequant_op_pass]e[0m
e[32m--- Running IR pass [delete_quant_dequant_filter_op_pass]e[0m
e[32m--- Running IR pass [simplify_with_basic_ops_pass]e[0m
e[32m--- Running IR pass [embedding_eltwise_layernorm_fuse_pass]e[0m
e[32m--- Running IR pass [multihead_matmul_fuse_pass_v2]e[0m
e[32m--- Running IR pass [multihead_matmul_fuse_pass_v3]e[0m
e[32m--- Running IR pass [skip_layernorm_fuse_pass]e[0m
e[32m--- Running IR pass [conv_bn_fuse_pass]e[0m
I0602 15:32:57.094473 11048 graph_pattern_detector.cc:91] --- detected 101 subgraphs
e[32m--- Running IR pass [unsqueeze2_eltwise_fuse_pass]e[0m
e[32m--- Running IR pass [squeeze2_matmul_fuse_pass]e[0m
e[32m--- Running IR pass [reshape2_matmul_fuse_pass]e[0m
e[32m--- Running IR pass [flatten2_matmul_fuse_pass]e[0m
e[32m--- Running IR pass [map_matmul_to_mul_pass]e[0m
e[32m--- Running IR pass [fc_fuse_pass]e[0m
e[32m--- Running IR pass [conv_elementwise_add_fuse_pass]e[0m
I0602 15:32:57.200225 11048 graph_pattern_detector.cc:91] --- detected 107 subgraphs
e[32m--- Running IR pass [tensorrt_subgraph_pass]e[0m
I0602 15:32:57.436590 11048 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 6 nodes
I0602 15:32:57.438585 11048 tensorrt_subgraph_pass.cc:377] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:32:57.690877 11048 engine.cc:86] Run Paddle-TRT FP16 mode
I0602 15:33:12.438405 11048 tensorrt_subgraph_pass.cc:398] Save TRT Optimized Info to ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_759475852344571393
I0602 15:33:12.438405 11048 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 6 nodes
I0602 15:33:12.487272 11048 tensorrt_subgraph_pass.cc:367] Load TRT Optimized Info from ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_11740562288058996496
I0602 15:33:12.488303 11048 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 6 nodes
I0602 15:33:12.511207 11048 tensorrt_subgraph_pass.cc:367] Load TRT Optimized Info from ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_4939479952197085245
I0602 15:33:12.511207 11048 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 5 nodes
I0602 15:33:12.512205 11048 tensorrt_subgraph_pass.cc:377] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:33:12.514200 11048 engine.cc:86] Run Paddle-TRT FP16 mode
I0602 15:33:31.955433 11048 tensorrt_subgraph_pass.cc:398] Save TRT Optimized Info to ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_1030095015889920660
I0602 15:33:31.955433 11048 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 7 nodes
I0602 15:33:31.956400 11048 tensorrt_subgraph_pass.cc:377] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:33:31.958428 11048 engine.cc:86] Run Paddle-TRT FP16 mode
I0602 15:33:56.991236 11048 tensorrt_subgraph_pass.cc:398] Save TRT Optimized Info to ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_12655451223740963510
I0602 15:33:56.992218 11048 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 7 nodes
I0602 15:33:57.053056 11048 tensorrt_subgraph_pass.cc:367] Load TRT Optimized Info from ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_15206687998534825820
I0602 15:33:57.054054 11048 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 6 nodes
I0602 15:33:57.055050 11048 tensorrt_subgraph_pass.cc:377] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:33:57.056047 11048 engine.cc:86] Run Paddle-TRT FP16 mode
I0602 15:34:10.793735 11048 tensorrt_subgraph_pass.cc:398] Save TRT Optimized Info to ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_7806627406309375964
I0602 15:34:10.794734 11048 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 110 nodes
I0602 15:34:10.807729 11048 tensorrt_subgraph_pass.cc:377] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:34:10.829638 11048 engine.cc:86] Run Paddle-TRT FP16 mode
I0602 15:39:16.048789 11048 tensorrt_subgraph_pass.cc:398] Save TRT Optimized Info to ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_6270123181120463535
I0602 15:39:26.343456 11048 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 10 nodes
I0602 15:39:26.422243 11048 tensorrt_subgraph_pass.cc:367] Load TRT Optimized Info from ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_3499799567721492121
I0602 15:39:26.423240 11048 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 10 nodes
I0602 15:39:26.487071 11048 tensorrt_subgraph_pass.cc:367] Load TRT Optimized Info from ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_8028159102263197096
I0602 15:39:26.488067 11048 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 6 nodes
I0602 15:39:26.489064 11048 tensorrt_subgraph_pass.cc:377] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:39:26.494051 11048 engine.cc:86] Run Paddle-TRT FP16 mode
I0602 15:39:41.457607 11048 tensorrt_subgraph_pass.cc:398] Save TRT Optimized Info to ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_10846985153248968263
I0602 15:39:41.458603 11048 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 4 nodes
I0602 15:39:41.462592 11048 tensorrt_subgraph_pass.cc:367] Load TRT Optimized Info from ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_16018133220312938016
I0602 15:39:41.462592 11048 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 8 nodes
I0602 15:39:41.467579 11048 tensorrt_subgraph_pass.cc:367] Load TRT Optimized Info from ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_15733399061656221822
I0602 15:39:41.467579 11048 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 6 nodes
I0602 15:39:41.468576 11048 tensorrt_subgraph_pass.cc:377] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:39:41.469574 11048 engine.cc:86] Run Paddle-TRT FP16 mode
I0602 15:39:55.194068 11048 tensorrt_subgraph_pass.cc:398] Save TRT Optimized Info to ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_14319932622819613100
I0602 15:39:55.195066 11048 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 6 nodes
I0602 15:39:55.214015 11048 tensorrt_subgraph_pass.cc:367] Load TRT Optimized Info from ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_2654164431932322192
I0602 15:39:55.215013 11048 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 5 nodes
I0602 15:39:55.216009 11048 tensorrt_subgraph_pass.cc:377] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:39:55.217008 11048 engine.cc:86] Run Paddle-TRT FP16 mode
I0602 15:40:14.356295 11048 tensorrt_subgraph_pass.cc:398] Save TRT Optimized Info to ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_18391883753208758622
I0602 15:40:14.357300 11048 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 4 nodes
I0602 15:40:14.362279 11048 tensorrt_subgraph_pass.cc:367] Load TRT Optimized Info from ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_10307500148995920653
I0602 15:40:14.362279 11048 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 4 nodes
I0602 15:40:14.366267 11048 tensorrt_subgraph_pass.cc:367] Load TRT Optimized Info from ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_1594808578423983610
I0602 15:40:14.367265 11048 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 5 nodes
I0602 15:40:14.381259 11048 tensorrt_subgraph_pass.cc:367] Load TRT Optimized Info from ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_75932943689739073
e[32m--- Running IR pass [conv_bn_fuse_pass]e[0m
e[32m--- Running IR pass [conv_elementwise_add_act_fuse_pass]e[0m
e[32m--- Running IR pass [conv_elementwise_add2_act_fuse_pass]e[0m
e[32m--- Running IR pass [transpose_flatten_concat_fuse_pass]e[0m
e[1me[35m--- Running analysis [ir_params_sync_among_devices_pass]e[0m
I0602 15:40:14.404167 11048 ir_params_sync_among_devices_pass.cc:45] Sync params from CPU to GPU
e[1me[35m--- Running analysis [adjust_cudnn_workspace_size_pass]e[0m
e[1me[35m--- Running analysis [inference_op_replace_pass]e[0m
e[1me[35m--- Running analysis [memory_optimize_pass]e[0m
I0602 15:40:14.469990 11048 memory_optimize_pass.cc:199] Cluster name : concat_4.tmp_0 size: 19660800
I0602 15:40:14.469990 11048 memory_optimize_pass.cc:199] Cluster name : im_shape size: 8
I0602 15:40:14.469990 11048 memory_optimize_pass.cc:199] Cluster name : nearest_interp_v2_1.tmp_0 size: 6553600
I0602 15:40:14.469990 11048 memory_optimize_pass.cc:199] Cluster name : tanh_24.tmp_0 size: 3276800
I0602 15:40:14.470989 11048 memory_optimize_pass.cc:199] Cluster name : tanh_20.tmp_0 size: 3276800
I0602 15:40:14.472982 11048 memory_optimize_pass.cc:199] Cluster name : tmp_10 size: 1638400
I0602 15:40:14.472982 11048 memory_optimize_pass.cc:199] Cluster name : scale_factor size: 8
e[1me[35m--- Running analysis [ir_graph_to_program_pass]e[0m
I0602 15:40:14.626572 11048 analysis_predictor.cc:595] ======= optimize end =======
I0602 15:40:14.627569 11048 naive_executor.cc:98] --- skip [feed], feed -> scale_factor
I0602 15:40:14.627569 11048 naive_executor.cc:98] --- skip [feed], feed -> image
I0602 15:40:14.628566 11048 naive_executor.cc:98] --- skip [feed], feed -> im_shape
I0602 15:40:14.635547 11048 naive_executor.cc:98] --- skip [concat_4.tmp_0], fetch -> fetch
I0602 15:40:14.635547 11048 naive_executor.cc:98] --- skip [nearest_interp_v2_1.tmp_0], fetch -> fetch
Successfully opened the dir !
total images = 9, batch_size = 1, total steps = 9
W0602 15:40:14.648514 11048 device_context.cc:404] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.1, Runtime API Version: 10.2
W0602 15:40:14.648514 11048 device_context.cc:422] device: 0, cuDNN Version: 7.6.
W0602 15:40:14.923776 11048 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:40:14.932752 11048 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:40:14.935743 11048 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:40:14.939733 11048 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:40:14.943722 11048 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:40:14.947712 11048 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:40:14.954697 11048 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:40:14.968657 11048 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:40:14.971648 11048 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:40:14.974640 11048 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:40:14.980624 11048 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:40:14.983616 11048 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:40:14.986608 11048 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:40:14.991595 11048 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:40:14.994587 11048 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:40:14.997578 11048 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:40:15.000571 11048 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:40:15.003562 11048 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:40:15.007552 11048 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
qco9c6ql25#
您好,您issue开始提的是paddlepaddle2.1,后面又提到是1.8,请问具体是哪个版本呢?
如果是1.8,可以切换到2.1试一下。
eqoofvh926#
这个是paddle_inference.zip cuda10.2 推离库第一次运行是log:
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0602 15:22:30.151445 12088 analysis_config.cc:424] use_dlnne_:0
I0602 15:22:30.151445 12088 analysis_config.cc:424] use_dlnne_:0
I0602 15:22:30.151445 12088 analysis_config.cc:424] use_dlnne_:0
I0602 15:22:30.151445 12088 analysis_config.cc:424] use_dlnne_:0
I0602 15:22:32.194638 12088 analysis_config.cc:424] use_dlnne_:0
I0602 15:22:32.194638 12088 analysis_predictor.cc:155] Profiler is deactivated, and no profiling report will be generated.
I0602 15:22:32.239552 12088 analysis_predictor.cc:508] TensorRT subgraph engine is enabled
e[1me[35m--- Running analysis [ir_graph_build_pass]e[0m
e[1me[35m--- Running analysis [ir_graph_clean_pass]e[0m
e[1me[35m--- Running analysis [ir_analysis_pass]e[0m
e[32m--- Running IR pass [conv_affine_channel_fuse_pass]e[0m
e[32m--- Running IR pass [adaptive_pool2d_convert_global_pass]e[0m
e[32m--- Running IR pass [conv_eltwiseadd_affine_channel_fuse_pass]e[0m
e[32m--- Running IR pass [shuffle_channel_detect_pass]e[0m
e[32m--- Running IR pass [quant_conv2d_dequant_fuse_pass]e[0m
e[32m--- Running IR pass [delete_quant_dequant_op_pass]e[0m
e[32m--- Running IR pass [delete_quant_dequant_filter_op_pass]e[0m
e[32m--- Running IR pass [simplify_with_basic_ops_pass]e[0m
e[32m--- Running IR pass [embedding_eltwise_layernorm_fuse_pass]e[0m
e[32m--- Running IR pass [multihead_matmul_fuse_pass_v2]e[0m
e[32m--- Running IR pass [multihead_matmul_fuse_pass_v3]e[0m
e[32m--- Running IR pass [skip_layernorm_fuse_pass]e[0m
e[32m--- Running IR pass [conv_bn_fuse_pass]e[0m
I0602 15:22:32.632467 12088 graph_pattern_detector.cc:91] --- detected 101 subgraphs
e[32m--- Running IR pass [unsqueeze2_eltwise_fuse_pass]e[0m
e[32m--- Running IR pass [squeeze2_matmul_fuse_pass]e[0m
e[32m--- Running IR pass [reshape2_matmul_fuse_pass]e[0m
e[32m--- Running IR pass [flatten2_matmul_fuse_pass]e[0m
e[32m--- Running IR pass [map_matmul_to_mul_pass]e[0m
e[32m--- Running IR pass [fc_fuse_pass]e[0m
e[32m--- Running IR pass [conv_elementwise_add_fuse_pass]e[0m
I0602 15:22:32.733230 12088 graph_pattern_detector.cc:91] --- detected 107 subgraphs
e[32m--- Running IR pass [tensorrt_subgraph_pass]e[0m
I0602 15:22:32.965576 12088 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 5 nodes
I0602 15:22:32.967635 12088 tensorrt_subgraph_pass.cc:377] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:22:33.219895 12088 engine.cc:86] Run Paddle-TRT FP16 mode
I0602 15:22:53.548440 12088 tensorrt_subgraph_pass.cc:398] Save TRT Optimized Info to ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_7269016396610443345
I0602 15:22:53.549407 12088 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 110 nodes
I0602 15:22:53.561410 12088 tensorrt_subgraph_pass.cc:377] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:22:53.581355 12088 engine.cc:86] Run Paddle-TRT FP16 mode
I0602 15:27:59.622704 12088 tensorrt_subgraph_pass.cc:398] Save TRT Optimized Info to ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_4754089665498488738
I0602 15:27:59.630683 12088 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 10 nodes
I0602 15:27:59.693531 12088 tensorrt_subgraph_pass.cc:367] Load TRT Optimized Info from ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_3499799567721492121
I0602 15:27:59.694512 12088 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 10 nodes
I0602 15:27:59.695509 12088 tensorrt_subgraph_pass.cc:377] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:27:59.698503 12088 engine.cc:86] Run Paddle-TRT FP16 mode
I0602 15:28:17.331161 12088 tensorrt_subgraph_pass.cc:398] Save TRT Optimized Info to ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_9324219253779922709
I0602 15:28:17.332129 12088 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 5 nodes
I0602 15:28:17.333158 12088 tensorrt_subgraph_pass.cc:377] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:28:17.336117 12088 engine.cc:86] Run Paddle-TRT FP16 mode
I0602 15:28:36.849318 12088 tensorrt_subgraph_pass.cc:398] Save TRT Optimized Info to ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_11743801656035219158
I0602 15:28:36.850317 12088 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 7 nodes
I0602 15:28:36.871258 12088 tensorrt_subgraph_pass.cc:367] Load TRT Optimized Info from ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_11855779748307096751
I0602 15:28:36.872254 12088 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 7 nodes
I0602 15:28:36.926110 12088 tensorrt_subgraph_pass.cc:367] Load TRT Optimized Info from ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_15206687998534825820
I0602 15:28:36.927109 12088 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 6 nodes
I0602 15:28:36.928105 12088 tensorrt_subgraph_pass.cc:377] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:28:36.932096 12088 engine.cc:86] Run Paddle-TRT FP16 mode
I0602 15:28:51.861507 12088 tensorrt_subgraph_pass.cc:398] Save TRT Optimized Info to ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_929599991364810441
I0602 15:28:51.862504 12088 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 6 nodes
I0602 15:28:51.864500 12088 tensorrt_subgraph_pass.cc:377] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:28:51.867491 12088 engine.cc:86] Run Paddle-TRT FP16 mode
I0602 15:29:05.550750 12088 tensorrt_subgraph_pass.cc:398] Save TRT Optimized Info to ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_4644331660945843395
I0602 15:29:05.551748 12088 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 4 nodes
I0602 15:29:05.556735 12088 tensorrt_subgraph_pass.cc:367] Load TRT Optimized Info from ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_16018133220312938016
I0602 15:29:05.557731 12088 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 8 nodes
I0602 15:29:05.562723 12088 tensorrt_subgraph_pass.cc:367] Load TRT Optimized Info from ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_15733399061656221822
I0602 15:29:05.562723 12088 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 6 nodes
I0602 15:29:05.568702 12088 tensorrt_subgraph_pass.cc:367] Load TRT Optimized Info from ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_4806183877132512881
I0602 15:29:05.568702 12088 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 6 nodes
I0602 15:29:05.583662 12088 tensorrt_subgraph_pass.cc:367] Load TRT Optimized Info from ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_2654164431932322192
I0602 15:29:05.584659 12088 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 6 nodes
I0602 15:29:05.585657 12088 tensorrt_subgraph_pass.cc:377] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:29:05.587652 12088 engine.cc:86] Run Paddle-TRT FP16 mode
I0602 15:29:20.842391 12088 tensorrt_subgraph_pass.cc:398] Save TRT Optimized Info to ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_13281610464113284375
I0602 15:29:20.843389 12088 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 4 nodes
I0602 15:29:20.848345 12088 tensorrt_subgraph_pass.cc:367] Load TRT Optimized Info from ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_10307500148995920653
I0602 15:29:20.848345 12088 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 6 nodes
I0602 15:29:20.856323 12088 tensorrt_subgraph_pass.cc:367] Load TRT Optimized Info from ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_7456650529460881910
I0602 15:29:20.857321 12088 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 4 nodes
I0602 15:29:20.861310 12088 tensorrt_subgraph_pass.cc:367] Load TRT Optimized Info from ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_1594808578423983610
I0602 15:29:20.861310 12088 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 6 nodes
I0602 15:29:20.882254 12088 tensorrt_subgraph_pass.cc:367] Load TRT Optimized Info from ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_4939479952197085245
I0602 15:29:20.883251 12088 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 5 nodes
I0602 15:29:20.896216 12088 tensorrt_subgraph_pass.cc:367] Load TRT Optimized Info from ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_75932943689739073
e[32m--- Running IR pass [conv_bn_fuse_pass]e[0m
e[32m--- Running IR pass [conv_elementwise_add_act_fuse_pass]e[0m
e[32m--- Running IR pass [conv_elementwise_add2_act_fuse_pass]e[0m
e[32m--- Running IR pass [transpose_flatten_concat_fuse_pass]e[0m
e[1me[35m--- Running analysis [ir_params_sync_among_devices_pass]e[0m
I0602 15:29:20.919155 12088 ir_params_sync_among_devices_pass.cc:45] Sync params from CPU to GPU
e[1me[35m--- Running analysis [adjust_cudnn_workspace_size_pass]e[0m
e[1me[35m--- Running analysis [inference_op_replace_pass]e[0m
e[1me[35m--- Running analysis [memory_optimize_pass]e[0m
I0602 15:29:20.980989 12088 memory_optimize_pass.cc:199] Cluster name : concat_4.tmp_0 size: 19660800
I0602 15:29:20.980989 12088 memory_optimize_pass.cc:199] Cluster name : im_shape size: 8
I0602 15:29:20.981986 12088 memory_optimize_pass.cc:199] Cluster name : nearest_interp_v2_1.tmp_0 size: 6553600
I0602 15:29:20.981986 12088 memory_optimize_pass.cc:199] Cluster name : tanh_24.tmp_0 size: 3276800
I0602 15:29:20.981986 12088 memory_optimize_pass.cc:199] Cluster name : tanh_20.tmp_0 size: 3276800
I0602 15:29:20.981986 12088 memory_optimize_pass.cc:199] Cluster name : tmp_10 size: 1638400
I0602 15:29:20.981986 12088 memory_optimize_pass.cc:199] Cluster name : scale_factor size: 8
e[1me[35m--- Running analysis [ir_graph_to_program_pass]e[0m
I0602 15:29:21.135617 12088 analysis_predictor.cc:595] ======= optimize end =======
I0602 15:29:21.135617 12088 naive_executor.cc:98] --- skip [feed], feed -> scale_factor
I0602 15:29:21.136574 12088 naive_executor.cc:98] --- skip [feed], feed -> image
I0602 15:29:21.139569 12088 naive_executor.cc:98] --- skip [feed], feed -> im_shape
I0602 15:29:21.144552 12088 naive_executor.cc:98] --- skip [concat_4.tmp_0], fetch -> fetch
I0602 15:29:21.144552 12088 naive_executor.cc:98] --- skip [nearest_interp_v2_1.tmp_0], fetch -> fetch
Successfully opened the dir !
total images = 9, batch_size = 1, total steps = 9
W0602 15:29:21.156556 12088 device_context.cc:404] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.1, Runtime API Version: 10.2
W0602 15:29:21.156556 12088 device_context.cc:422] device: 0, cuDNN Version: 7.6.
W0602 15:29:21.812768 12088 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:29:21.819746 12088 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:29:21.822736 12088 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:29:21.825749 12088 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:29:21.828722 12088 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:29:21.830716 12088 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:29:21.835702 12088 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:29:21.844678 12088 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:29:21.846673 12088 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:29:21.848668 12088 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:29:21.852658 12088 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:29:21.854652 12088 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:29:21.857645 12088 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:29:21.861634 12088 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:29:21.863628 12088 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:29:21.865622 12088 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:29:21.867617 12088 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:29:21.870609 12088 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:29:21.873601 12088 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
sulc1iza27#
使用了两个版本的paddle_inference.zip cuda10.2 与cuda10.0版本的c++推理预测库,其中10.0版本tensorrt的生成序列化后第二次启动工程,会报错如上log所示:E0602 15:15:20.104661 20144 helper.h:78] C:\source\rtSafe\coreReadArchive.cpp (55) - Serialization Error in nvinfer1::rt::CoreReadArchive::verifyHeader: 0 (Length in header does not match remaining archive length)
E0602 15:15:20.107652 20144 helper.h:78] INVALID_STATE: Unknown exception
E0602 15:15:20.108650 20144 helper.h:78] INVALID_CONFIG: Deserialize the cuda engine failed.
wooyq4lh28#
直接调用Paddle inference 推理接口
这个是1.8版本的第二次运行,上面log是第一次生成时,下面是第二次生成时log报错:但是使用2.1版本会重复生成
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0602 15:14:36.549612 20144 analysis_predictor.cc:155] Profiler is deactivated, and no profiling report will be generated.
I0602 15:14:36.602471 20144 analysis_predictor.cc:522] TensorRT subgraph engine is enabled
e[1me[35m--- Running analysis [ir_graph_build_pass]e[0m
e[1me[35m--- Running analysis [ir_graph_clean_pass]e[0m
e[1me[35m--- Running analysis [ir_analysis_pass]e[0m
e[32m--- Running IR pass [conv_affine_channel_fuse_pass]e[0m
e[32m--- Running IR pass [adaptive_pool2d_convert_global_pass]e[0m
e[32m--- Running IR pass [conv_eltwiseadd_affine_channel_fuse_pass]e[0m
e[32m--- Running IR pass [shuffle_channel_detect_pass]e[0m
e[32m--- Running IR pass [quant_conv2d_dequant_fuse_pass]e[0m
e[32m--- Running IR pass [delete_quant_dequant_op_pass]e[0m
e[32m--- Running IR pass [delete_quant_dequant_filter_op_pass]e[0m
e[32m--- Running IR pass [simplify_with_basic_ops_pass]e[0m
e[32m--- Running IR pass [embedding_eltwise_layernorm_fuse_pass]e[0m
e[32m--- Running IR pass [multihead_matmul_fuse_pass_v2]e[0m
e[32m--- Running IR pass [multihead_matmul_fuse_pass_v3]e[0m
e[32m--- Running IR pass [skip_layernorm_fuse_pass]e[0m
e[32m--- Running IR pass [unsqueeze2_eltwise_fuse_pass]e[0m
e[32m--- Running IR pass [conv_bn_fuse_pass]e[0m
I0602 15:14:37.383383 20144 graph_pattern_detector.cc:101] --- detected 101 subgraphs
e[32m--- Running IR pass [squeeze2_matmul_fuse_pass]e[0m
e[32m--- Running IR pass [reshape2_matmul_fuse_pass]e[0m
e[32m--- Running IR pass [flatten2_matmul_fuse_pass]e[0m
e[32m--- Running IR pass [map_matmul_to_mul_pass]e[0m
e[32m--- Running IR pass [fc_fuse_pass]e[0m
e[32m--- Running IR pass [conv_elementwise_add_fuse_pass]e[0m
I0602 15:14:37.492090 20144 graph_pattern_detector.cc:101] --- detected 107 subgraphs
e[32m--- Running IR pass [tensorrt_subgraph_pass]e[0m
I0602 15:14:37.773383 20144 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 7 nodes
I0602 15:14:37.776376 20144 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:14:38.031678 20144 engine.cc:88] Run Paddle-TRT FP16 mode
I0602 15:15:00.974486 20144 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 5 nodes
I0602 15:15:00.975484 20144 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:15:00.977491 20144 engine.cc:88] Run Paddle-TRT FP16 mode
I0602 15:15:20.102705 20144 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 4 nodes
E0602 15:15:20.104661 20144 helper.h:78] C:\source\rtSafe\coreReadArchive.cpp (55) - Serialization Error in nvinfer1::rt::CoreReadArchive::verifyHeader: 0 (Length in header does not match remaining archive length)
E0602 15:15:20.107652 20144 helper.h:78] INVALID_STATE: Unknown exception
E0602 15:15:20.108650 20144 helper.h:78] INVALID_CONFIG: Deserialize the cuda engine failed.
7vux5j2d29#
请问您是使用的PaddleDetection套件的C++部署,还是直接调用的Paddle Inference的推理接口呢
8ljdwjyq30#
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0602 15:00:13.355023 15880 analysis_predictor.cc:155] Profiler is deactivated, and no profiling report will be generated.
I0602 15:00:13.406883 15880 analysis_predictor.cc:522] TensorRT subgraph engine is enabled
e[1me[35m--- Running analysis [ir_graph_build_pass]e[0m
e[1me[35m--- Running analysis [ir_graph_clean_pass]e[0m
e[1me[35m--- Running analysis [ir_analysis_pass]e[0m
e[32m--- Running IR pass [conv_affine_channel_fuse_pass]e[0m
e[32m--- Running IR pass [adaptive_pool2d_convert_global_pass]e[0m
e[32m--- Running IR pass [conv_eltwiseadd_affine_channel_fuse_pass]e[0m
e[32m--- Running IR pass [shuffle_channel_detect_pass]e[0m
e[32m--- Running IR pass [quant_conv2d_dequant_fuse_pass]e[0m
e[32m--- Running IR pass [delete_quant_dequant_op_pass]e[0m
e[32m--- Running IR pass [delete_quant_dequant_filter_op_pass]e[0m
e[32m--- Running IR pass [simplify_with_basic_ops_pass]e[0m
e[32m--- Running IR pass [embedding_eltwise_layernorm_fuse_pass]e[0m
e[32m--- Running IR pass [multihead_matmul_fuse_pass_v2]e[0m
e[32m--- Running IR pass [multihead_matmul_fuse_pass_v3]e[0m
e[32m--- Running IR pass [skip_layernorm_fuse_pass]e[0m
e[32m--- Running IR pass [unsqueeze2_eltwise_fuse_pass]e[0m
e[32m--- Running IR pass [conv_bn_fuse_pass]e[0m
I0602 15:00:14.175864 15880 graph_pattern_detector.cc:101] --- detected 101 subgraphs
e[32m--- Running IR pass [squeeze2_matmul_fuse_pass]e[0m
e[32m--- Running IR pass [reshape2_matmul_fuse_pass]e[0m
e[32m--- Running IR pass [flatten2_matmul_fuse_pass]e[0m
e[32m--- Running IR pass [map_matmul_to_mul_pass]e[0m
e[32m--- Running IR pass [fc_fuse_pass]e[0m
e[32m--- Running IR pass [conv_elementwise_add_fuse_pass]e[0m
I0602 15:00:14.285533 15880 graph_pattern_detector.cc:101] --- detected 107 subgraphs
e[32m--- Running IR pass [tensorrt_subgraph_pass]e[0m
I0602 15:00:14.566781 15880 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 6 nodes
I0602 15:00:14.575757 15880 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:00:15.157200 15880 engine.cc:88] Run Paddle-TRT FP16 mode
I0602 15:00:30.641460 15880 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 6 nodes
I0602 15:00:30.642489 15880 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:00:30.643487 15880 engine.cc:88] Run Paddle-TRT FP16 mode
I0602 15:00:44.689169 15880 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 6 nodes
I0602 15:00:44.690166 15880 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:00:44.692162 15880 engine.cc:88] Run Paddle-TRT FP16 mode
I0602 15:00:58.561785 15880 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 6 nodes
I0602 15:00:58.563781 15880 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:00:58.564779 15880 engine.cc:88] Run Paddle-TRT FP16 mode
I0602 15:01:13.806190 15880 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 4 nodes
I0602 15:01:13.807195 15880 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:01:13.808144 15880 engine.cc:88] Run Paddle-TRT FP16 mode
I0602 15:01:18.347720 15880 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 110 nodes
I0602 15:01:18.362681 15880 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:01:18.384621 15880 engine.cc:88] Run Paddle-TRT FP16 mode
I0602 15:06:32.004756 15880 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 10 nodes
I0602 15:06:32.006747 15880 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:06:32.009743 15880 engine.cc:88] Run Paddle-TRT FP16 mode
I0602 15:06:49.696780 15880 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 10 nodes
I0602 15:06:49.698774 15880 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:06:49.701767 15880 engine.cc:88] Run Paddle-TRT FP16 mode
I0602 15:07:07.526752 15880 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 5 nodes
I0602 15:07:07.527715 15880 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:07:07.529747 15880 engine.cc:88] Run Paddle-TRT FP16 mode
I0602 15:07:27.537101 15880 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 7 nodes
I0602 15:07:27.538098 15880 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:07:27.540093 15880 engine.cc:88] Run Paddle-TRT FP16 mode
I0602 15:07:53.054272 15880 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 7 nodes
I0602 15:07:53.055269 15880 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:07:53.059258 15880 engine.cc:88] Run Paddle-TRT FP16 mode
I0602 15:08:15.018090 15880 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 4 nodes
I0602 15:08:15.019086 15880 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:08:15.019086 15880 engine.cc:88] Run Paddle-TRT FP16 mode
I0602 15:08:19.482378 15880 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 6 nodes
I0602 15:08:19.483381 15880 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:08:19.484373 15880 engine.cc:88] Run Paddle-TRT FP16 mode
I0602 15:08:33.088526 15880 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 5 nodes
I0602 15:08:33.089483 15880 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:08:33.090479 15880 engine.cc:88] Run Paddle-TRT FP16 mode
I0602 15:08:52.708403 15880 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 6 nodes
I0602 15:08:52.709401 15880 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:08:52.710398 15880 engine.cc:88] Run Paddle-TRT FP16 mode
I0602 15:09:06.350334 15880 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 4 nodes
I0602 15:09:06.351332 15880 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:09:06.352332 15880 engine.cc:88] Run Paddle-TRT FP16 mode
I0602 15:09:10.719928 15880 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 6 nodes
I0602 15:09:10.720925 15880 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:09:10.722921 15880 engine.cc:88] Run Paddle-TRT FP16 mode
I0602 15:09:25.621057 15880 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 5 nodes
I0602 15:09:25.622053 15880 tensorrt_subgraph_pass.cc:347] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0602 15:09:25.623050 15880 engine.cc:88] Run Paddle-TRT FP16 mode
e[32m--- Running IR pass [conv_bn_fuse_pass]e[0m
e[32m--- Running IR pass [conv_elementwise_add_act_fuse_pass]e[0m
e[32m--- Running IR pass [conv_elementwise_add2_act_fuse_pass]e[0m
e[32m--- Running IR pass [transpose_flatten_concat_fuse_pass]e[0m
e[1me[35m--- Running analysis [ir_params_sync_among_devices_pass]e[0m
I0602 15:09:34.398572 15880 ir_params_sync_among_devices_pass.cc:45] Sync params from CPU to GPU
e[1me[35m--- Running analysis [adjust_cudnn_workspace_size_pass]e[0m
e[1me[35m--- Running analysis [inference_op_replace_pass]e[0m
e[1me[35m--- Running analysis [memory_optimize_pass]e[0m
I0602 15:09:34.463398 15880 memory_optimize_pass.cc:201] Cluster name : concat_4.tmp_0 size: 19660800
I0602 15:09:34.463398 15880 memory_optimize_pass.cc:201] Cluster name : nearest_interp_v2_1.tmp_0 size: 6553600
I0602 15:09:34.464413 15880 memory_optimize_pass.cc:201] Cluster name : im_shape size: 8
I0602 15:09:34.464413 15880 memory_optimize_pass.cc:201] Cluster name : transpose_1.tmp_0 size: 211200
I0602 15:09:34.466395 15880 memory_optimize_pass.cc:201] Cluster name : softplus_20.tmp_0 size: 3276800
I0602 15:09:34.467388 15880 memory_optimize_pass.cc:201] Cluster name : transpose_0.tmp_0 size: 52800
I0602 15:09:34.467388 15880 memory_optimize_pass.cc:201] Cluster name : tanh_20.tmp_0 size: 3276800
I0602 15:09:34.467388 15880 memory_optimize_pass.cc:201] Cluster name : yolo_box_1.tmp_0 size: 76800
I0602 15:09:34.467388 15880 memory_optimize_pass.cc:201] Cluster name : yolo_box_0.tmp_0 size: 19200
I0602 15:09:34.468385 15880 memory_optimize_pass.cc:201] Cluster name : scale_factor size: 8
I0602 15:09:34.468385 15880 memory_optimize_pass.cc:201] Cluster name : cast_0.tmp_0 size: 8
e[1me[35m--- Running analysis [ir_graph_to_program_pass]e[0m
I0602 15:09:34.637933 15880 analysis_predictor.cc:598] ======= optimize end =======
I0602 15:09:34.637933 15880 naive_executor.cc:107] --- skip [feed], feed -> scale_factor
I0602 15:09:34.638929 15880 naive_executor.cc:107] --- skip [feed], feed -> image
I0602 15:09:34.638929 15880 naive_executor.cc:107] --- skip [feed], feed -> im_shape
I0602 15:09:34.646908 15880 naive_executor.cc:107] --- skip [nearest_interp_v2_1.tmp_0], fetch -> fetch
I0602 15:09:34.646908 15880 naive_executor.cc:107] --- skip [concat_4.tmp_0], fetch -> fetch
W0602 15:09:34.656881 15880 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.1, Runtime API Version: 10.0
W0602 15:09:34.657894 15880 device_context.cc:372] device: 0, cuDNN Version: 7.6.
W0602 15:09:35.054816 15880 helper.h:74] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:09:35.060801 15880 helper.h:74] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:09:35.063792 15880 helper.h:74] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:09:35.066797 15880 helper.h:74] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:09:35.069777 15880 helper.h:74] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:09:35.072768 15880 helper.h:74] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:09:35.076758 15880 helper.h:74] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:09:35.086731 15880 helper.h:74] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:09:35.088726 15880 helper.h:74] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:09:35.090720 15880 helper.h:74] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:09:35.094712 15880 helper.h:74] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:09:35.096704 15880 helper.h:74] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:09:35.098709 15880 helper.h:74] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:09:35.102689 15880 helper.h:74] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:09:35.104683 15880 helper.h:74] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:09:35.106678 15880 helper.h:74] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:09:35.108673 15880 helper.h:74] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0602 15:09:35.111665 15880 helper.h:74] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
Inference: 64.061005 ms per batch image
class=2 confidence=0.2698 rect=[193 204 156 202]
Visualized output saved as output\126_1.jpg
Inference: 22.954700 ms per batch image
class=2 confidence=0.2224 rect=[193 204 184 231]
Visualized output saved as output\126_2.jpg
Inference: 21.290199 ms per batch image
class=0 confidence=0.2029 rect=[249 621 0 175]
Visualized output saved as output\126_5.jpg
Inference: 21.661400 ms per batch image
Visualized output saved as output\35_1.jpg
Inference: 21.788601 ms per batch image
Visualized output saved as output\35_2.jpg
Inference: 22.311899 ms per batch image
Visualized output saved as output\35_3.jpg
Inference: 21.007299 ms per batch image
Visualized output saved as output\35_4.jpg
Inference: 21.519300 ms per batch image
Visualized output saved as output\39_1.jpg
Inference: 22.053101 ms per batch image
class=0 confidence=0.2046 rect=[259 426 9 214]
Visualized output saved as output\39_2.jpg
Inference: 20.740301 ms per batch image
class=0 confidence=0.3153 rect=[228 275 5 191]
Visualized output saved as output\39_3.jpg
D:\projects\PaddleDetection\deploy\cpp\Release\main.exe (进程 1352)已退出,返回代码为: -1073740791。