Tensorflow:无法加载动态库“libcusolver.so.11”;删 debugging 误:库中解决方案编号11:无法打开共享目标文件:无此文件

mzillmmw  于 2023-03-13  发布在  其他
关注(0)|答案(2)|浏览(441)

我已经尝试在我的gpu上运行tensorflow好几天了,但是我一直没能完成它。
我知道有几个问题与类似的问题,但我已经尝试了一切我发现,它没有工作,所以这就是为什么我写这个问题:
How to install libcusolver.so.11
https://stackoverflow.com/a/67642774/15098668
我已经为Nvidia GeForce RTX 3090安装了驱动程序460.106.00和cuda 11.2:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.106.00   Driver Version: 460.106.00   CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 3090    On   | 00000000:08:00.0  On |                  N/A |
| 33%   26C    P8    22W / 350W |    282MiB / 24260MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1264      G   /usr/lib/xorg/Xorg                 59MiB |
|    0   N/A  N/A      3349      G   /usr/lib/xorg/Xorg                124MiB |
|    0   N/A  N/A      3508      G   /usr/bin/gnome-shell               77MiB |
|    0   N/A  N/A      6384      G   /usr/lib/firefox/firefox            4MiB |
+-----------------------------------------------------------------------------+

客户:

cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 8
#define CUDNN_MINOR 1
#define CUDNN_PATCHLEVEL 1

GCC编译器:

gcc --version
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0

我还将LD_LIRARY_PATH添加到./bashrc中

# Nvidia cuda toolkit
export PATH=/usr/local/cuda-11.2/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.2/lib64${LD_LIBRARY_PATH+:${LD_LIBRARY_PATH}}
export CUDA_HOME=/usr/local/cuda

我尝试了几个tensorflow和tensorflow-gpu版本,从2.4到2.7,但每个版本都失败了:

2022-01-24 21:28:43.206834: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory

2022-01-24 21:28:44.087779: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-01-24 21:28:44.087827: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2022-01-24 21:28:44.087858: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2022-01-24 21:28:44.087891: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory
2022-01-24 21:28:44.087921: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory
2022-01-24 21:28:44.087947: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory
2022-01-24 21:28:44.087975: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory

提前感谢,我不知道还能尝试什么...

w1jd8yoj

w1jd8yoj1#

在尝试了很多东西之后,我创建了一个新的conda环境并安装了tensorflow-gpu,因为我不关心TF版本:

conda install tensorflow-gpu -c anaconda

它安装了以下所有软件包:

package                    |            build
    ---------------------------|-----------------
    _tflow_select-2.1.0        |              gpu           2 KB  anaconda
    absl-py-0.10.0             |           py38_0         170 KB  anaconda
    aiohttp-3.6.3              |   py38h7b6447c_0         622 KB  anaconda
    astunparse-1.6.3           |             py_0          17 KB  anaconda
    async-timeout-3.0.1        |           py38_0          12 KB  anaconda
    attrs-20.2.0               |             py_0          41 KB  anaconda
    blas-1.0                   |              mkl           6 KB  anaconda
    blinker-1.4                |           py38_0          21 KB  anaconda
    brotlipy-0.7.0             |py38h7b6447c_1000         349 KB  anaconda
    c-ares-1.16.1              |       h7b6447c_0         112 KB  anaconda
    ca-certificates-2020.10.14 |                0         128 KB  anaconda
    cachetools-4.1.1           |             py_0          12 KB  anaconda
    certifi-2020.6.20          |           py38_0         160 KB  anaconda
    cffi-1.14.0                |   py38h2e261b9_0         228 KB  anaconda
    chardet-3.0.4              |        py38_1003         170 KB  anaconda
    click-7.1.2                |             py_0          67 KB  anaconda
    cryptography-3.1.1         |   py38h1ba5d50_0         618 KB  anaconda
    cudatoolkit-10.1.243       |       h6bb024c_0       513.2 MB  anaconda
    cudnn-7.6.5                |       cuda10.1_0       250.6 MB  anaconda
    cupti-10.1.168             |                0         1.7 MB  anaconda
    gast-0.3.3                 |             py_0          14 KB  anaconda
    google-auth-1.22.1         |             py_0          62 KB  anaconda
    google-auth-oauthlib-0.4.1 |             py_2          21 KB  anaconda
    google-pasta-0.2.0         |             py_0          44 KB  anaconda
    grpcio-1.31.0              |   py38hf8bcb03_0         2.3 MB  anaconda
    h5py-2.10.0                |   py38hd6299e0_1         1.1 MB  anaconda
    hdf5-1.10.6                |       hb1b8bf9_0         4.8 MB  anaconda
    idna-2.10                  |             py_0          56 KB  anaconda
    importlib-metadata-2.0.0   |             py_1          35 KB  anaconda
    intel-openmp-2020.2        |              254         947 KB  anaconda
    keras-preprocessing-1.1.0  |             py_1          36 KB  anaconda
    libgfortran-ng-7.3.0       |       hdf63c60_0         1.3 MB  anaconda
    libprotobuf-3.13.0.1       |       hd408876_0         2.3 MB  anaconda
    markdown-3.3.2             |           py38_0         123 KB  anaconda
    mkl-2019.4                 |              243       204.1 MB  anaconda
    mkl-service-2.3.0          |   py38he904b0f_0          68 KB  anaconda
    mkl_fft-1.2.0              |   py38h23d657b_0         173 KB  anaconda
    mkl_random-1.1.0           |   py38h962f231_0         398 KB  anaconda
    multidict-4.7.6            |   py38h7b6447c_1          72 KB  anaconda
    numpy-1.19.1               |   py38hbc911f0_0          20 KB  anaconda
    numpy-base-1.19.1          |   py38hfa32c7d_0         5.3 MB  anaconda
    oauthlib-3.1.0             |             py_0          88 KB  anaconda
    openssl-1.1.1h             |       h7b6447c_0         3.8 MB  anaconda
    opt_einsum-3.1.0           |             py_0          54 KB  anaconda
    protobuf-3.13.0.1          |   py38he6710b0_1         702 KB  anaconda
    pyasn1-0.4.8               |             py_0          58 KB  anaconda
    pyasn1-modules-0.2.8       |             py_0          67 KB  anaconda
    pycparser-2.20             |             py_2          94 KB  anaconda
    pyjwt-1.7.1                |           py38_0          32 KB  anaconda
    pyopenssl-19.1.0           |             py_1          47 KB  anaconda
    pysocks-1.7.1              |           py38_0          27 KB  anaconda
    requests-2.24.0            |             py_0          54 KB  anaconda
    requests-oauthlib-1.3.0    |             py_0          22 KB  anaconda
    rsa-4.6                    |             py_0          26 KB  anaconda
    scipy-1.5.2                |   py38h0b6359f_0        18.7 MB  anaconda
    six-1.15.0                 |             py_0          13 KB  anaconda
    tensorboard-2.2.1          |     pyh532a8cf_0         2.5 MB  anaconda
    tensorboard-plugin-wit-1.6.0|             py_0         663 KB  anaconda
    tensorflow-2.2.0           |gpu_py38hb782248_0           4 KB  anaconda
    tensorflow-base-2.2.0      |gpu_py38h83e3d50_0       421.3 MB  anaconda
    tensorflow-estimator-2.2.0 |     pyh208ff02_0         276 KB  anaconda
    tensorflow-gpu-2.2.0       |       h0d30ee6_0           2 KB  anaconda
    termcolor-1.1.0            |           py38_1           8 KB  anaconda
    urllib3-1.25.11            |             py_0          93 KB  anaconda
    werkzeug-1.0.1             |             py_0         243 KB  anaconda
    wrapt-1.12.1               |   py38h7b6447c_1          50 KB  anaconda
    yarl-1.6.2                 |   py38h7b6447c_0         142 KB  anaconda
    zipp-3.3.1                 |             py_0          11 KB  anaconda
    ------------------------------------------------------------
                                           Total:        1.41 GB

包括cudatoolkit和cudnn...
在那之后,我不知道为什么,TF检测到了nvidia卡:

2022-01-25 09:37:52.865587: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2022-01-25 09:37:52.902796: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-01-25 09:37:52.903487: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:08:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.69GiB deviceMemoryBandwidth: 871.81GiB/s
2022-01-25 09:37:52.903637: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2022-01-25 09:37:52.904633: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2022-01-25 09:37:52.905878: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2022-01-25 09:37:52.906023: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2022-01-25 09:37:52.907115: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2022-01-25 09:37:52.907719: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2022-01-25 09:37:52.910042: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2022-01-25 09:37:52.910137: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-01-25 09:37:52.911078: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-01-25 09:37:52.911707: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
Num GPUs Available:  1

Prcess finished with exit code 0
wswtfjt7

wswtfjt72#

确保遵循tensorflow软件兼容性:https://www.tensorflow.org/install/source#gpu
更多详情请点击此处:https://stackoverflow.com/a/50622526
我在使用时遇到此问题

  • Python==3.10
  • tensorflow ==2.8.0
  • CUDA==11.0
  • 客户数量==8.0

通过将python和tensorflow分别降级到3.6和2.4.0来解决这个问题。因此,满足了tensorflow兼容性。

相关问题