Paddle Cudnn error, CUDNN_STATUS_BAD_PARAM,process_num>1时经常出core

tyg4sfes  于 2022-04-21  发布在  Java
关注(0)|答案(1)|浏览(167)

gcc82的C++服务:
预测库:https://irepo.baidu-int.com/rest/prod/v3/baidu/vis-open/paddle/releases/1.8.4.1/files
paddle1.8.4
cuda10.1
libpaddle_fluid_gpu_cudnn7.so
libcudnn.so.7.5.0

process_num=1时多并发请求不会报错,但是process_num>1时经常会出core

改为cudn10.2 cudnn7.6也一样报错
export LD_LIBRARY_PATH=/usr/lib64/:./so/:/home/work/cuda-10.2/lib64/:/home/work/cudnn/cudnn_v7.6/cuda/lib64/

报错日志:
I0915 18:53:13.308756 10415 predictor.cpp:113] Predictor::predict
I0915 18:53:13.308779 10415 predictor.cpp:28] Predictor::feed
W0915 18:53:13.308849 10415 device_context.cc:252] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.2, Runtime API Version: 10.1
W0915 18:53:13.308980 10415 device_context.cc:260] device: 0, cuDNN Version: 7.6.
find fluid handle
width224 height 224 channels 3
I0915 18:53:13.316826 10414 predictor.cpp:113] Predictor::predict
I0915 18:53:13.316859 10414 predictor.cpp:28] Predictor::feed
terminate called after throwing an instance of 'paddle::platform::EnforceNotMet'
what():

C++ Call Stacks (More useful to developers):

0 std::string paddle::platform::GetTraceBackString<char const*>(char const*&&, char const*, int)
1 paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int)
2 paddle::platform::TensorDescriptor::set(paddle::framework::Tensor const&, cudnnTensorFormat_t)
3 paddle::operators::CUDNNConvOpKernel::Compute(paddle::framework::ExecutionContext const&) const
4 std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::CUDNNConvOpKernel, paddle::operators::CUDNNConvOpKernel, paddle::operators::CUDNNConvOpKernelpaddle::platform::float16 >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&)
5 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext*) const
6 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const
7 paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&)
8 paddle::framework::NaiveExecutor::Run()
9 paddle::AnalysisPredictor::ZeroCopyRun()

Python Call Stacks (More useful to users):

File "/home/ssd9/nizihan/env/python-gcc482-paddle-1.7.1/lib/python2.7/site-packages/paddle/fluid/framework.py", line 2525, in append_op
attrs=kwargs.get("attrs", None))
File "/home/ssd9/nizihan/env/python-gcc482-paddle-1.7.1/lib/python2.7/site-packages/paddle/fluid/layer_helper.py", line 43, in append_op
return self.main_program.current_block().append_op(*args,**kwargs)
File "/home/ssd9/nizihan/env/python-gcc482-paddle-1.7.1/lib/python2.7/site-packages/paddle/fluid/layers/nn.py", line 1405, in conv2d
"data_format": data_format,
File "/home/ssd9/nizihan/plate/train_code/plate_type_classification_degree/models/resnet_vd.py", line 146, in conv_bn_layer
bias_attr=False)
File "/home/ssd9/nizihan/plate/train_code/plate_type_classification_degree/models/resnet_vd.py", line 67, in net
name='conv1_1')
File "infer.py", line 100, in infer
out = model.net(input=image, class_dim=args.class_dim)
File "infer.py", line 204, in main
infer(args)
File "infer.py", line 208, in
main()

Error Message Summary:

ExternalError: Cudnn error, CUDNN_STATUS_BAD_PARAM at (/home/scmbuild/workspaces_cluster.dev/baidu.lib.paddlepaddle/baidu/lib/paddlepaddle/Paddle/paddle/fluid/platform/cudnn_desc.h:155)
[operator < conv2d > error]

00jrzges

00jrzges1#

您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档常见问题历史IssueAI社区来寻求解答。祝您生活愉快~

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the APIFAQGithub Issue and AI community to get the answer.Have a nice day!

相关问题