问题类型
错误
来源
来源
tensorflow 版本
2.10.0
自定义代码
是的
操作系统平台和分发
Ubuntu 20.04.4语言版本
移动的设备
- 没有回应 *
Python版本
3.8.10
Bazel版本
5.1.1
GCC/编译器版本
9.4.0
CUDA/cuDNN版本
11.2
GPU型号和内存
RTX 3090 2* 24克
当前行为?
The results of `LRNGrad` operators are inconsistent between CPU and GPU.
I've used two calls `tf.raw_ops.LRNGrad` and `nn.lrn_grad`,
and also changed the order of calls for different devices,
still inconsistent results.
重现问题的独立代码
import tensorflow as tf
from tensorflow.python.ops import nn
from tensorflow.python.ops import random_ops
# https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/lrn_op.cc
input_grads = random_ops.random_uniform(
shape=[1, 1, 1, 3],
minval=-10000,
maxval=10000,
dtype=tf.float32,
seed=2022)
input_img = random_ops.random_uniform(
shape=[1, 1, 1, 3],
minval=-10000,
maxval=10000,
dtype=tf.float32,
seed=2022)
output_img = random_ops.random_uniform(
shape=[1, 1, 1, 3],
minval=-10000,
maxval=10000,
dtype=tf.float32,
seed=2022)
with tf.device('/GPU:0'):
out = tf.raw_ops.LRNGrad(input_grads=input_grads, input_image=input_img, output_image=output_img)
#out = nn.lrn_grad(input_grads=input_grads, input_image=input_img, output_image=output_img)
print(out)
with tf.device('/CPU:0'):
out = tf.raw_ops.LRNGrad(input_grads=input_grads, input_image=input_img, output_image=output_img)
#out = nn.lrn_grad(input_grads=input_grads, input_image=input_img, output_image=output_img)
print(out)
相关日志输出
# python LRNgrad-test.py
2022-07-21 12:54:57.023583: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-07-21 12:54:57.132843: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-07-21 12:54:57.161994: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-07-21 12:55:00.101305: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-07-21 12:55:01.526010: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22298 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:9b:00.0, compute capability: 8.6
2022-07-21 12:55:01.527383: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 22298 MB memory: -> device: 1, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:c8:00.0, compute capability: 8.6
2022-07-21 12:55:02.763201: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8100
tf.Tensor([[[[-0.29212222 0.97755533 -0.28474247]]]], shape=(1, 1, 1, 3), dtype=float32)
tf.Tensor([[[[2362.0498 1360.1172 2242.2402]]]], shape=(1, 1, 1, 3), dtype=float32)
6条答案
按热度按时间inkz8wg91#
@加达加什维尼,
我可以在tensorflow v2.8,v2.9和nightly上重现这个问题。请在这里找到它的要点。
h79rfbju2#
@enderdzz,我尝试在Tensorflow 2.9.1版本中重现您的问题,但我得到了不同的错误,请您查看要点here并进行必要的修改。谢谢!
zqdjd7g93#
您好,感谢您的回复:)
我在本地安装了2.9.1版本(cuda 11.2),运行同样的代码没有任何问题,仍然得到不一致的结果。我怀疑远程colab的GPU环境是AMD ROCm,所以会出现你遇到的问题,请参考:RadeonOpenCompute/ROCm#684的最大值
所以最好换到cuda环境来测试这段代码。
1l5u6lss4#
Colab GPU使用英伟达和Cuda。
下面是执行
!nvidia-smi
时将得到的结果+-———————————————————————————————————————————————————————————————————————————-+
| NVIDIA ® SMI 460.32.03驱动程序版本:460.32.03 CUDA版本:11.2|
|- —————————————————————————————-+-————————————————————-+-————————————————————-+
| GPU名称持久性-M|业务标识显示A|易失性不一致ECC|
| 风扇温度性能功率:使用/容量|内存使用|GPU实用程序计算M.|
| | | 米格M.|
|===============================+======================+======================|
| 0特斯拉T4关闭|00000000:00:04.0关闭|第0页|
| 不适用55 C P0 29瓦/70瓦|464百万字节/15109百万字节|0%默认值|
| | | 不适用|
+-—————————————————————————————-+-————————————————————-+-————————————————————-+
+-———————————————————————————————————————————————————————————————————————————-+
| 流程:|
| GPU GI CI PID类型进程名称GPU内存|
| ID ID用法|
|=============================================================================|
+-———————————————————————————————————————————————————————————————————————————-+
dsf9zpds5#
好的。我现在不知道这个问题。
希望能得到相关TF开发者的帮助。
pqwbnv8z6#
如果我将
output_img
更改为我认为,如果您将无效值传递给
output_image
值,CPU和GPU渐变返回不同的结果是合理的。output_image
必须是给定input_image
的正确正向传递输出,如果您为output_image
传递无效值,则操作没有明确定义的语义。@rohan100jain,如果从正向传递中给出一个无效的输出,那么梯度操作在CPU和GPU上返回不同的结果是可以的,你同意吗?