tensorflow CPU和GPU之间tf.raw_ops.LRNGrad的结果不一致

jk9hmnmh  于 2022-10-29  发布在  其他
关注(0)|答案(6)|浏览(248)

问题类型

错误

来源

来源

tensorflow 版本

2.10.0

自定义代码

是的

操作系统平台和分发

Ubuntu 20.04.4语言版本

移动的设备

  • 没有回应 *

Python版本

3.8.10

Bazel版本

5.1.1

GCC/编译器版本

9.4.0

CUDA/cuDNN版本

11.2

GPU型号和内存

RTX 3090 2* 24克

当前行为?

The results of `LRNGrad` operators are inconsistent between CPU and GPU.

I've used two calls `tf.raw_ops.LRNGrad` and `nn.lrn_grad`,
and also changed the order of calls for different devices, 
still inconsistent results.

重现问题的独立代码

import tensorflow as tf
from tensorflow.python.ops import nn
from tensorflow.python.ops import random_ops

# https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/lrn_op.cc

input_grads = random_ops.random_uniform(
        shape=[1, 1, 1, 3],
        minval=-10000,
        maxval=10000,
        dtype=tf.float32,
        seed=2022)
input_img   = random_ops.random_uniform(
        shape=[1, 1, 1, 3],
        minval=-10000,
        maxval=10000,
        dtype=tf.float32,
        seed=2022)
output_img  = random_ops.random_uniform(
        shape=[1, 1, 1, 3],
        minval=-10000,
        maxval=10000,
        dtype=tf.float32,
        seed=2022)

with tf.device('/GPU:0'):
    out = tf.raw_ops.LRNGrad(input_grads=input_grads, input_image=input_img, output_image=output_img)
    #out = nn.lrn_grad(input_grads=input_grads, input_image=input_img, output_image=output_img)
    print(out)

with tf.device('/CPU:0'):
    out = tf.raw_ops.LRNGrad(input_grads=input_grads, input_image=input_img, output_image=output_img)
    #out = nn.lrn_grad(input_grads=input_grads, input_image=input_img, output_image=output_img)
    print(out)

相关日志输出


# python LRNgrad-test.py

2022-07-21 12:54:57.023583: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-07-21 12:54:57.132843: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-07-21 12:54:57.161994: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-07-21 12:55:00.101305: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-07-21 12:55:01.526010: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22298 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:9b:00.0, compute capability: 8.6
2022-07-21 12:55:01.527383: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 22298 MB memory:  -> device: 1, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:c8:00.0, compute capability: 8.6
2022-07-21 12:55:02.763201: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8100

tf.Tensor([[[[-0.29212222  0.97755533 -0.28474247]]]], shape=(1, 1, 1, 3), dtype=float32)

tf.Tensor([[[[2362.0498 1360.1172 2242.2402]]]], shape=(1, 1, 1, 3), dtype=float32)
inkz8wg9

inkz8wg91#

@加达加什维尼,
我可以在tensorflow v2.8v2.9和nightly上重现这个问题。请在这里找到它的要点。

h79rfbju

h79rfbju2#

@enderdzz,我尝试在Tensorflow 2.9.1版本中重现您的问题,但我得到了不同的错误,请您查看要点here并进行必要的修改。谢谢!

zqdjd7g9

zqdjd7g93#

您好,感谢您的回复:)
我在本地安装了2.9.1版本(cuda 11.2),运行同样的代码没有任何问题,仍然得到不一致的结果。我怀疑远程colab的GPU环境是AMD ROCm,所以会出现你遇到的问题,请参考:RadeonOpenCompute/ROCm#684的最大值
所以最好换到cuda环境来测试这段代码。

1l5u6lss

1l5u6lss4#

Colab GPU使用英伟达和Cuda。
下面是执行!nvidia-smi时将得到的结果
+-———————————————————————————————————————————————————————————————————————————-+
| NVIDIA ® SMI 460.32.03驱动程序版本:460.32.03 CUDA版本:11.2|
|- —————————————————————————————-+-————————————————————-+-————————————————————-+
| GPU名称持久性-M|业务标识显示A|易失性不一致ECC|
| 风扇温度性能功率:使用/容量|内存使用|GPU实用程序计算M.|
| | | 米格M.|
|===============================+======================+======================|
| 0特斯拉T4关闭|00000000:00:04.0关闭|第0页|
| 不适用55 C P0 29瓦/70瓦|464百万字节/15109百万字节|0%默认值|
| | | 不适用|
+-—————————————————————————————-+-————————————————————-+-————————————————————-+
+-———————————————————————————————————————————————————————————————————————————-+
| 流程:|
| GPU GI CI PID类型进程名称GPU内存|
| ID ID用法|
|=============================================================================|
+-———————————————————————————————————————————————————————————————————————————-+

dsf9zpds

dsf9zpds5#

好的。我现在不知道这个问题。
希望能得到相关TF开发者的帮助。

pqwbnv8z

pqwbnv8z6#

如果我将output_img更改为

output_img = tf.nn.local_response_normalization(input_img)

我认为,如果您将无效值传递给output_image值,CPU和GPU渐变返回不同的结果是合理的。output_image必须是给定input_image的正确正向传递输出,如果您为output_image传递无效值,则操作没有明确定义的语义。
@rohan100jain,如果从正向传递中给出一个无效的输出,那么梯度操作在CPU和GPU上返回不同的结果是可以的,你同意吗?

相关问题