Paddle OSError: (External) CUDA error(222), the provided PTX was compiled with an unsupported toolchain..

ccgok5k5  于 5个月前  发布在  其他
关注(0)|答案(5)|浏览(59)

问题描述 Issue Description

A100机器安装paddle,命令如下:
WITH_GPU=ON
WITH_DISTRIBUTE=ON
WITH_MKL=ON
DWITH_GLOO=ON
cmake .. -DCMAKE_INSTALL_PREFIX=./output/
-DCMAKE_BUILD_TYPE=Release
-DWITH_PYTHON=ON
-DWITH_MKL=$WITH_MKL
-DWITH_GPU=$WITH_GPU
-DCUDA_ARCH_NAME=Auto
-DON_INFER=ON
-DWITH_TESTING=ON
-DWITH_DISTRIBUTE=$WITH_DISTRIBUTE
-DPY_VERSION=3.7
-DWITH_GLOO=$DWITH_GLOO
-DWITH_TENSORRT=ON
-DTENSORRT_ROOT=/usr/local/TensorRT-8.6.1.6/

运行报错:
File "/root/miniconda3/lib/python3.7/site-packages/paddle/nn/layer/conv.py", line 703, in init
data_format=data_format,
File "/root/miniconda3/lib/python3.7/site-packages/paddle/nn/layer/conv.py", line 159, in init
default_initializer=_get_default_param_initializer(),
File "/root/miniconda3/lib/python3.7/site-packages/paddle/nn/layer/layers.py", line 715, in create_parameter
temp_attr, shape, dtype, is_bias, default_initializer
File "/root/miniconda3/lib/python3.7/site-packages/paddle/fluid/layer_helper_base.py", line 431, in create_parameter
**attr._to_kwargs(with_initializer=True)
File "/root/miniconda3/lib/python3.7/site-packages/paddle/fluid/framework.py", line 3949, in create_parameter
initializer(param, self)
File "/root/miniconda3/lib/python3.7/site-packages/paddle/nn/initializer/initializer.py", line 40, in call
return self.forward(param, block)
File "/root/miniconda3/lib/python3.7/site-packages/paddle/nn/initializer/normal.py", line 77, in forward
place,
OSError: (External) CUDA error(222), the provided PTX was compiled with an unsupported toolchain..
[Hint: 'cudaErrorUnsupportedPtxVersion'. This indicates that the provided PTX was compiled with an unsupported toolchain. The most common reason for this, is the PTXwas generated by a compiler newer than what is supported by the CUDA driver and PTX JIT compiler.] (at /root/paddlejob/workspace/env_run/zhangyaxian/Paddle/paddle/phi/backends/gpu/cuda/cuda_info.cc:209)

求解决,谢谢。

版本&环境信息 Version & Environment Information

Paddle version: develop
cuda: 11.4

628mspwn

628mspwn1#

您好,根据 https://forums.developer.nvidia.com/t/provided-ptx-was-compiled-with-an-unsupported-toolchain-error-using-cub/168292 这个回答,导致这个问题的原因是您环境中的driver版本和nvcc版本不匹配导致的,根据建议您需要升级一下您机器的driver版本。

可以参考这个文档获取CUDA版本和driver的匹配信息 https://docs.nvidia.com/deploy/cuda-compatibility/index.html

pgpifvop

pgpifvop2#


我的cuda版本和driver是匹配的。

rm5edbpk

rm5edbpk3#

@XYZ-916 根据这个错误提示

This indicates that the provided PTX was compiled with an unsupported toolchain. The most common reason for this, is the PTXwas generated by a compiler newer than what is supported by the CUDA driver and PTX JIT compiler.] (at

应该是您编译环境里面的nvcc的版本高于您的driver能支持的版本了,可能是您环境中 nvcc 的版本和cuda11.4不一致导致的,你能检查一下您编译环境里面的nvcc的版本吗?

# 检查下 nvcc 的地址
which nvcc
# 类似如下输出
/usr/local/cuda/bin/nvcc

# 检查下 nvcc 的版本
nvcc --version
# 类似如下输出
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_Oct_11_21:27:02_PDT_2021
Cuda compilation tools, release 11.4, V11.4.152
Build cuda_11.4.r11.4/compiler.30521435_0

如果nvcc版本也没有问题,建议可以试试看使用nvidia的官方镜像 nvidia/cuda:11.4.3-cudnn8-devel-ubuntu18.04 编译或者运行来避免环境问题。

ycl3bljg

ycl3bljg4#

请问这个问题解决了吗,我是用PaddleOCR做推理时抱这个错,这是我的环境信息

gym@stresstest:~/PaddleOCR$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0
gym@stresstest:~$ nvidia-smi
Mon Aug 21 15:05:10 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
wecizke3

wecizke35#

请问这个问题解决了吗,我是用PaddleOCR做推理时抱这个错,这是我的环境信息

gym@stresstest:~/PaddleOCR$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0
gym@stresstest:~$ nvidia-smi
Mon Aug 21 15:05:10 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |

i solved my problem!!!
nvidia-smi show as flows ,
Fri Jul 12 19:24:40 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+

this means you can install paddlepaddle post112. i install post117 and came across this problem.

https://www.paddlepaddle.org.cn/whl/linux/gpu/develop.html

wget https://paddle-wheel.bj.bcebos.com/develop/linux/linux-gpu-cuda11.2-cudnn8-mkl-gcc8.2-avx/paddlepaddle_gpu-0.0.0.post112-cp39-cp39-linux_x86_64.whl

python -m pip install paddlepaddle-gpu==0.0.0.post112 -f paddlepaddle_gpu-0.0.0.post112-cp39-cp39-linux_x86_64.whl

and now run check.py
import paddle
paddle.utils.run_check()

output as follows, fixed the problem!

Running verify PaddlePaddle program ...
I0712 19:22:10.413522 124943 program_interpreter.cc:243] New Executor is Running.
W0712 19:22:10.414631 124943 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 8.0, Driver API Version: 11.4, Runtime API Version: 11.7
W0712 19:22:10.415433 124943 gpu_resources.cc:164] device: 0, cuDNN Version: 8.9.
I0712 19:22:23.808157 124943 interpreter_util.cc:646] Standalone Executor is Used.
PaddlePaddle works well on 1 GPU.

相关问题