我试图运行我的代码Keras CuDNNGRU在tensorflow使用gpu,但它总是得到错误“找不到dnn实现”,即使我已经安装了CUDA和CuDNN.
我已经重新安装了CUDA和CuDNN几次,并将CuDNN版本从7.2.1升级到7.5.0,但没有修复任何问题。我也尝试在Jupyter Notebook和python编译器(在终端上)上运行我的代码,结果都是一样的。以下是我的硬件和软件的详细信息。
1.特斯拉V100 PCIE 16 GB
- Ubuntu 18.04版
1.英伟达-SMI 384.183 - CUDA 9.0标准
1.铜DNN 7.5.0
1.微型摄像机3 - Python 3.6语言
1.tensorflow 1.12
1.角速度2.1.6
这是我的代码。
encoder_LSTM = tf.keras.layers.CuDNNGRU(hidden_unit,return_sequences=True,return_state=True)
encoder_LSTM_rev=tf.keras.layers.CuDNNGRU(hidden_unit,return_state=True,return_sequences=True,go_backwards=True)
encoder_outputs, state_h = encoder_LSTM(x)
encoder_outputsR, state_hR = encoder_LSTM_rev(x)
这是错误消息。
2019-05-27 19:08:06.814896: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-05-27 19:08:06.814956: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-27 19:08:06.814971: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-05-27 19:08:06.814978: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-05-27 19:08:06.815279: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14678 MB memory) -> physical GPU (device: 0, name: Tesla V100-PCIE-16GB, pci bus id: 0000:00:05.0, compute capability: 7.0)
2019-05-27 19:08:08.050226: E tensorflow/stream_executor/cuda/cuda_dnn.cc:373] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2019-05-27 19:08:08.050350: E tensorflow/stream_executor/cuda/cuda_dnn.cc:381] Possibly insufficient driver version: 384.183.0
2019-05-27 19:08:08.050378: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at cudnn_rnn_ops.cc:1214 : Unknown: Fail to find the dnn implementation.
2019-05-27 19:08:08.050483: E tensorflow/stream_executor/cuda/cuda_dnn.cc:373] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2019-05-27 19:08:08.050523: E tensorflow/stream_executor/cuda/cuda_dnn.cc:381] Possibly insufficient driver version: 384.183.0
2019-05-27 19:08:08.050541: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at cudnn_rnn_ops.cc:1214 : Unknown: Fail to find the dnn implementation.
Traceback (most recent call last):
File "/home/paperspace/.conda/envs/gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/home/paperspace/.conda/envs/gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/paperspace/.conda/envs/gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.UnknownError: Fail to find the dnn implementation.
[[{{node cu_dnngru/CudnnRNN}} = CudnnRNN[T=DT_FLOAT, direction="unidirectional", dropout=0, input_mode="linear_input", is_training=true, rnn_mode="gru", seed=0, seed2=0, _device="/job:localhost/replica:0/task:0/device:GPU:0"](cu_dnngru/transpose, cu_dnngru/ExpandDims, gradients/while/Shape/Enter_grad/zeros/Const, cu_dnngru/concat)]]
[[{{node mean_squared_error/value/_37}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1756_mean_squared_error/value", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "ta_skenario1.py", line 271, in <module>
losss, op = sess.run([loss, optimizer], feed_dict={x:data,y_label:label,initial_input:begin_sentence})
File "/home/paperspace/.conda/envs/gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/home/paperspace/.conda/envs/gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/home/paperspace/.conda/envs/gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/home/paperspace/.conda/envs/gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnknownError: Fail to find the dnn implementation.
[[node cu_dnngru/CudnnRNN (defined at ta_skenario1.py:205) = CudnnRNN[T=DT_FLOAT, direction="unidirectional", dropout=0, input_mode="linear_input", is_training=true, rnn_mode="gru", seed=0, seed2=0, _device="/job:localhost/replica:0/task:0/device:GPU:0"](cu_dnngru/transpose, cu_dnngru/ExpandDims, gradients/while/Shape/Enter_grad/zeros/Const, cu_dnngru/concat)]]
[[{{node mean_squared_error/value/_37}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1756_mean_squared_error/value", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Caused by op 'cu_dnngru/CudnnRNN', defined at:
File "ta_skenario1.py", line 205, in <module>
encoder_outputs, state_h = encoder_LSTM(x)
File "/home/paperspace/.conda/envs/gpu/lib/python3.6/site-packages/tensorflow/python/keras/layers/recurrent.py", line 619, in __call__
return super(RNN, self).__call__(inputs, **kwargs)
File "/home/paperspace/.conda/envs/gpu/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 757, in __call__
outputs = self.call(inputs, *args, **kwargs)
File "/home/paperspace/.conda/envs/gpu/lib/python3.6/site-packages/tensorflow/python/keras/layers/cudnn_recurrent.py", line 109, in call
output, states = self._process_batch(inputs, initial_state)
File "/home/paperspace/.conda/envs/gpu/lib/python3.6/site-packages/tensorflow/python/keras/layers/cudnn_recurrent.py", line 299, in _process_batch
rnn_mode='gru')
File "/home/paperspace/.conda/envs/gpu/lib/python3.6/site-packages/tensorflow/python/ops/gen_cudnn_rnn_ops.py", line 116, in cudnn_rnn
is_training=is_training, name=name)
File "/home/paperspace/.conda/envs/gpu/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/paperspace/.conda/envs/gpu/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/paperspace/.conda/envs/gpu/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/home/paperspace/.conda/envs/gpu/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
self._traceback = tf_stack.extract_stack()
UnknownError (see above for traceback): Fail to find the dnn implementation.
[[node cu_dnngru/CudnnRNN (defined at ta_skenario1.py:205) = CudnnRNN[T=DT_FLOAT, direction="unidirectional", dropout=0, input_mode="linear_input", is_training=true, rnn_mode="gru", seed=0, seed2=0, _device="/job:localhost/replica:0/task:0/device:GPU:0"](cu_dnngru/transpose, cu_dnngru/ExpandDims, gradients/while/Shape/Enter_grad/zeros/Const, cu_dnngru/concat)]]
[[{{node mean_squared_error/value/_37}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1756_mean_squared_error/value", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
知道吗?谢谢
更新:我尝试将CuDNN版本从7.5.0降级到7.1.4,但结果仍然相同。
6条答案
按热度按时间eoigrqb61#
配置您的GPU以允许增长对我的TF 2.0工作。我发现这个解决方案在另一个问题几个月前当我有一个问题运行TF 2.0之前。不记得在哪里。
添加以下内容,它可能是好的。
lyfkaqu12#
您是否测试了您的安装(cuda、cudnn、tensorflow-gpu)?
**测试cuda:**首先检查是否:
显示您的cuda工具包的正确版本。然后,您可以使用以下过程对其进行测试:
首先(需要几分钟):
然后:
如果您收到:“结果:通过”,你们都很好!
测试客户:
结果应该是:“测试通过!”
测试tensorflow -GPU:
如果cuda和cudnn正常工作,可以使用以下命令测试tensorflow安装:
我建议您在conda环境中安装tensorflow,使用:
对我来说(在经历了很多问题之后),它运行得非常好。
资料来源:
t40tm48m3#
正如here所建议的,这对我在Tensorflow 2中起作用
klr1opcd4#
不确定它是否有帮助,但在我的情况下,问题是由使用多个jupyter笔记本文件给出的。
我在写一个简单的神经网络代码,我决定把它分成两个笔记本,一个用于训练,一个用于预测(如果你没有资源/时间来训练你的网络,我在一个文件中提供了我保存的模型)。
如果我“一起”运行两个笔记本,所以基本上首先是训练,然后是预测,而不断开第一个代码的内核,我会得到这个错误。
断开第一个jupyter笔记本的内核在使用第二个之前解决了我的问题。
olmpazwi5#
如果您在TF2.0和Cuda 10.0中使用cuDNN-7时遇到此问题,很可能是因为您不小心将cuDNN从
7.6.2
升级到了>7.6.5
。尽管TF文档声明任何>=7.4.1
都可以正常工作,但实际情况并非如此!请按以下步骤降级到CudNN:将来您可以在Ubuntu/Debian中通过在
aptitude
中标记cuDNN的更新来将其搁置:suzh9iv86#
检查以下软件包的所有版本后,我的代码正常工作:cuda,cudnn,tensorflow和gcc。你需要为所有的人找到相应的版本,希望它有帮助!