安装paddlepaddle之后,运行check时提示找不到libcublas.so

raogr8fs  于 4个月前  发布在  其他
关注(0)|答案(8)|浏览(55)

请提出你的问题 Please ask your question

Hi team~

这边已经安装了cuda和cudnn,但是在check时出现以下报错

import paddle

paddle.utils.run_check()

报错:

RuntimeError: (PreconditionNotMet) The third-party dynamic library (libcublas.so) that Paddle depends on is not configured correctly. (error code is libcublas.so: cannot open shared object file: No such file or directory)
Suggestions:

  1. Check if the third-party dynamic library (e.g. CUDA, CUDNN) is installed correctly and its version is matched with paddlepaddle you installed.
  2. Configure third-party dynamic library environment variables as follows:
  • Linux: set LD_LIBRARY_PATH by export LD_LIBRARY_PATH=...
  • Windows: set PATH by `set PATH=XXX; (at /paddle/paddle/phi/backends/dynload/dynamic_loader.cc:303)

但是cuda安装路径上(/usr/local/cuda/lib64)是存在这个so文件的,而且LD_LIBRARY_PATH和CUDA_HOME环境变量配置正确

我安装的版本号如下

python -m pip install paddlepaddle-gpu==2.3.0.post110 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html

pobjuy32

pobjuy321#

@WystanW 您可以先在您的环境里面找一下 libcublas.so 这个文件是安装在哪里的吗?比如

$ cd /usr/
$ find ./ -name "libcublas.so"
./local/cuda-11.4/targets/x86_64-linux/lib/stubs/libcublas.so
./local/cuda-11.4/targets/x86_64-linux/lib/libcublas.so

找到 libcublas.so 之后,设置一下环境变量,有2种方法,设置之后再重跑一下run check,两个方法里面有一个方法是能找到 “libcublas.so” 的

# 方法1:设置 LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.4/targets/x86_64-linux/lib:${LD_LIBRARY_PATH}

# 方法2:设置 FLAGS_cuda_dir
export FLAGS_cuda_dir=/usr/local/cuda-11.4/targets/x86_64-linux/lib
ljo96ir5

ljo96ir52#

@qili93 您好,我的LD_LIBRARY_PATH输出是包含了cuda的lib目录的,并且该目录存在libcublas.so文件

xienkqul

xienkqul3#

@qili93 但是我的文件都是libcublas.so.11和libcublas.so.11.2.0.252,这个要做软链吗?

x3naxklr

x3naxklr4#

@qili93 但是我的文件都是libcublas.so.11和libcublas.so.11.2.0.252,这个要做软链吗?

是的,paddle查找的库名是 libcublas.so,参考 https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/phi/backends/dynload/dynamic_loader.cc#L323 这里的查找代码。需要在您的环境里面链接为 libcublas.so 的库名,否则就会提示找不到的错误。

vpfxa7rd

vpfxa7rd5#

@qili93 这边执行以后直接出现killed,在哪里能看到报错信息呢?

Running verify PaddlePaddle program ... 
W0731 01:56:53.850996   590 gpu_context.cc:278] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.0
W0731 01:56:53.854722   590 gpu_context.cc:306] device: 0, cuDNN Version: 8.0.
Killed
dba5bblo

dba5bblo6#

@qili93 还是找不到对应文件,但是能明显看到so文件已经在path里面了,我已经打印了出来

代码如下:

import paddle
import os

print(os.environ.get('LD_LIBRARY_PATH', ''))
paddle.utils.run_check()
  • 输出如下
notebooks_research@jupyter-zhenwei--zhenweicv:/projects/zhenwei$ python test.py 
WARNING: OMP_NUM_THREADS set to 4, not 1. The computation speed will not be optimized if you use data parallel. It will fail if this PaddlePaddle binary is compiled with OpenBlas since OpenBlas does not support multi-threads.
PLEASE USE OMP_NUM_THREADS WISELY.
/opt/conda/lib/python3.8/site-packages/cv2/../../lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64:/opt/teradata/client/16.20/odbc_64/lib:/usr/lib:/opt/teradata/client/16.20/lib:/usr/lib/oracle/11.2/client64/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/projects/zhenwei
Running verify PaddlePaddle program ... 
W0731 02:21:15.118137   346 gpu_context.cc:278] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.0
W0731 02:21:15.121729   346 gpu_context.cc:306] device: 0, cuDNN Version: 8.0.
Traceback (most recent call last):
  File "test.py", line 5, in <module>
    paddle.utils.run_check()
  File "/opt/conda/lib/python3.8/site-packages/paddle/utils/install_check.py", line 266, in run_check
    _run_static_single(use_cuda, use_xpu, use_npu)
  File "/opt/conda/lib/python3.8/site-packages/paddle/utils/install_check.py", line 170, in _run_static_single
    exe.run(startup_prog)
  File "/opt/conda/lib/python3.8/site-packages/paddle/fluid/executor.py", line 1299, in run
    six.reraise(*sys.exc_info())
  File "/opt/conda/lib/python3.8/site-packages/six.py", line 719, in reraise
    raise value
  File "/opt/conda/lib/python3.8/site-packages/paddle/fluid/executor.py", line 1285, in run
    res = self._run_impl(
  File "/opt/conda/lib/python3.8/site-packages/paddle/fluid/executor.py", line 1510, in _run_impl
    return self._run_program(
  File "/opt/conda/lib/python3.8/site-packages/paddle/fluid/executor.py", line 1607, in _run_program
    self._default_executor.run(program.desc, scope, 0, True, True,
RuntimeError: (PreconditionNotMet) The third-party dynamic library (libcublas.so) that Paddle depends on is not configured correctly. (error code is libcublas.so: cannot open shared object file: No such file or directory)
  Suggestions:
  1. Check if the third-party dynamic library (e.g. CUDA, CUDNN) is installed correctly and its version is matched with paddlepaddle you installed.
  2. Configure third-party dynamic library environment variables as follows:
  - Linux: set LD_LIBRARY_PATH by `export LD_LIBRARY_PATH=...`
  - Windows: set PATH by `set PATH=XXX; (at /paddle/paddle/phi/backends/dynload/dynamic_loader.cc:303)

notebooks_research@jupyter-zhenwei--zhenweicv:/projects/zhenwei$ ls -l /projects/zhenwei/
total 2097197
drwxr-xr-x   3 notebooks_research root          0 11月  2  2022 app_repos
drwxr-xr-x   3 notebooks_research root          0 11月  2  2022 apps
-rw-r--r--   1 notebooks_research root        487 7月  25 08:48 getRuntimeVersion.c
-rw-r--r--   1 notebooks_research root         96 7月  25 09:30 hello.c
-rw-r--r--   1 notebooks_research root 2147483648 7月  27 08:34 large_file.txt
lrwxrwxrwx   1 notebooks_research root         39 7月  31 01:46 libcublas.so -> /usr/local/cuda/libcublas.so.11.2.0.252
-rw-r--r--   1 notebooks_research root       4529 7月  31 01:57 mlflow.ipynb
drwxr-xr-x   2 root               root          0 4月  12 05:33 shared_data
drwxrwxrwx 277 notebooks_research root          0 7月  30 09:14 shared-lib
-rw-r--r--   1 notebooks_research root         95 7月  31 02:20 test.py
-rw-r--r--   1 notebooks_research root      39034 7月  31 02:06 Untitled.ipynb
notebooks_research@jupyter-zhenwei--zhenweicv:/projects/zhenwei$
  • /projects/zhenwei 目录下已经存在了文件libcublas.so,是个软链; /projects/zhenwei 已经在LD_LIBRARY_PATH下,为什么还是不行呢?
bksxznpy

bksxznpy7#

paddlepaddle安装方式:

pip install paddlepaddle-gpu==2.3.0.post110 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html

jq6vz3qz

jq6vz3qz8#

hi, @WystanW 根据您的这个截图

libcublas.so.11.2.0.252 的文件路径应该是 “/usr/local/cuda/lib64/libcublas.so.11.2.0.252” 而不是您这里显示的 “/usr/local/cuda/libcublas.so.11.2.0.252”?路径中间少了 lib64

您可以确认下 ls -l /usr/local/cuda/libcublas.so.11.2.0.252 看下库路径是否正确,其次 ldd /usr/local/cuda/libcublas.so.11.2.0.252 看下您环境中的这个库的其他依赖库是否正确,谢谢!

相关问题