Paddle 模型评估报错CUDNN_STATUS_NOT_SUPPORTED

osh3o9ms  于 2022-10-20  发布在  其他
关注(0)|答案(6)|浏览(600)

为使您的问题得到快速解决,在建立Issue前,请您先通过如下方式搜索是否有相似问题:【搜索issue关键字】【使用labels筛选】【官方文档】

建立issue时,为快速解决问题,请您根据使用情况给出如下信息:

  • 标题:简洁、精准描述您的问题,例如“ssd 模型前置lstm报错  ”
  • 版本、环境信息:

   1)PaddlePaddle版本:paddle1.7.1
   2)CPU:请提供CPU型号,MKL/OpenBlas/MKLDNN/等数学库的使用情况
   3)GPU:CUDA9.0 CUDNN7.3 V100
   4)系统环境:CentOS python3.6

  • 模型信息

   1)模型名称 2)使用数据集名称 3)使用算法名称 4)模型链接

  • 复现信息:如为报错,请给出复现环境、复现步骤
  • 问题描述:请详细描述您的问题,同步贴出报错信息、日志/代码关键片段

运行分割库模型unet的评估时,在batch_size=16时有如下报错:
File "/opt/_internal/cpython-3.6.0/lib/python3.6/site-packages/paddle/fluid/framework.py", line 2525, in append_op
attrs=kwargs.get("attrs", None))
File "/opt/_internal/cpython-3.6.0/lib/python3.6/site-packages/paddle/fluid/layer_helper.py", line 45, in append_op
return self.main_program.current_block().append_op(*args,**kwargs)
File "/opt/_internal/cpython-3.6.0/lib/python3.6/site-packages/paddle/fluid/layers/nn.py", line 1405, in conv2d
"data_format": data_format,
File "/ssd1/xiege/test/PaddleSeg/pdseg/models/libs/model_libs.py", line 127, in conv
return fluid.layers.conv2d(*args,**kargs)
File "/ssd1/xiege/test/PaddleSeg/pdseg/models/modeling/unet.py", line 39, in double_conv
conv(data, out_ch, 3, stride=1, padding=1, param_attr=param_attr))
File "/ssd1/xiege/test/PaddleSeg/pdseg/models/modeling/unet.py", line 83, in encode
data = double_conv(data, 64)
File "/ssd1/xiege/test/PaddleSeg/pdseg/models/modeling/unet.py", line 128, in unet
encode_data, short_cuts = encode(input)
File "/ssd1/xiege/test/PaddleSeg/pdseg/models/model_builder.py", line 75, in seg_model
logits = unet.unet(image, class_num)
File "/ssd1/xiege/test/PaddleSeg/pdseg/models/model_builder.py", line 220, in build_model
logits = seg_model(image, class_num)
File "pdseg/eval.py", line 95, in evaluate
test_prog, startup_prog, phase=ModelPhase.EVAL)
File "pdseg/eval.py", line 174, in main
evaluate(cfg,**args.dict)
File "pdseg/eval.py", line 178, in
main()

Error Message Summary:

Error: An error occurred here. There is no accurate error hint for this error yet. We are continuously in the process of increasing hint for this kind of error check. It would be helpful if you could inform us of how this conversion went by opening a github issue. And we will resolve it with high priority.

[Hint: CUDNN_STATUS_NOT_SUPPORTED] at (/paddle/paddle/fluid/platform/cudnn_desc.h:155)
[operator < conv2d > error]
batch_size=1的时候能正常运行,batch_size=8的时候会有显存不足的报错,所以batch_size=16正常来说应该报显存不足的。。不知道为啥报这个错,麻烦帮忙看下吧~
paddle报CUDNN_STATUS_NOT_SUPPORTED #20087 看到之前有类似报错,但没解决
附配置文件:
unet.txt

cvxl0en2

cvxl0en21#

  1. 贴下更详细的log
  2. 确认你安装paddle的cudnn版本是否和你本地的cudnn一致。
t3irkdon

t3irkdon2#

1.报错的上半部分是:
load test model: saved/unet/final
/opt/_internal/cpython-3.6.0/lib/python3.6/site-packages/paddle/fluid/executor.py:782: UserWarning: The following exception is not an EOF exception.
"The following exception is not an EOF exception.")
Traceback (most recent call last):
File "pdseg/eval.py", line 178, in
main()
File "pdseg/eval.py", line 174, in main
evaluate(cfg,**args.dict)
File "pdseg/eval.py", line 135, in evaluate
test_prog, fetch_list=fetch_list, return_numpy=True)
File "/opt/_internal/cpython-3.6.0/lib/python3.6/site-packages/paddle/fluid/executor.py", line 783, in run
six.reraise(*sys.exc_info())
File "/opt/_internal/cpython-3.6.0/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/opt/_internal/cpython-3.6.0/lib/python3.6/site-packages/paddle/fluid/executor.py", line 778, in run
use_program_cache=use_program_cache)
File "/opt/_internal/cpython-3.6.0/lib/python3.6/site-packages/paddle/fluid/executor.py", line 831, in _run_impl
use_program_cache=use_program_cache)
File "/opt/_internal/cpython-3.6.0/lib/python3.6/site-packages/paddle/fluid/executor.py", line 905, in _run_program
fetch_var_name)
paddle.fluid.core_avx.EnforceNotMet:

C++ Call Stacks (More useful to developers):

0 std::string paddle::platform::GetTraceBackString<char const*>(char const*&&, char const*, int)
1 paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int)
2 paddle::platform::TensorDescriptor::set(paddle::framework::Tensor const&, cudnnTensorFormat_t)
3 paddle::operators::CUDNNConvOpKernel::Compute(paddle::framework::ExecutionContext const&) const
4 ZNSt17_Function_handlerIFvRKN6paddle9framework16ExecutionContextEEZNKS1_24OpKernelRegistrarFunctorINS0_8platform9CUDAPlaceELb0ELm0EJNS0_9operators17CUDNNConvOpKernelIfEENSA_IdEENSA_INS7_7float16EEEEEclEPKcSH_iEUlS4_E_E9_M_invokeERKSt9_Any_dataS4
5 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext*) const
6 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const
7 paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&)
8 paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool)
9 paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::vector<std::string, std::allocatorstd::string > const&, bool, bool)
2.cudnn版本一致的

62lalag4

62lalag43#

log 不全。麻烦贴下所有的log。

pgx2nnw8

pgx2nnw85#

可否更新下cudnn为v7.5试试

xa9qqrwz

xa9qqrwz6#

试了cudnn v7.5, v7.6,也都是一样的报错

相关问题