tensorflow CPU和GPU之间tf.raw_ops.LRNGrad的结果不一致

jk9hmnmh 于 2022-10-29 发布在其他

关注(0)|答案(6)|浏览(248)

问题类型

错误

来源

tensorflow 版本

2.10.0

自定义代码

是的

操作系统平台和分发

Ubuntu 20.04.4语言版本

移动的设备

没有回应 *

Python版本

3.8.10

Bazel版本

5.1.1

GCC/编译器版本

9.4.0

CUDA/cuDNN版本

11.2

GPU型号和内存

RTX 3090 2* 24克

当前行为？

The results of `LRNGrad` operators are inconsistent between CPU and GPU.

I've used two calls `tf.raw_ops.LRNGrad` and `nn.lrn_grad`,
and also changed the order of calls for different devices, 
still inconsistent results.

重现问题的独立代码

import tensorflow as tf
from tensorflow.python.ops import nn
from tensorflow.python.ops import random_ops

# https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/lrn_op.cc

input_grads = random_ops.random_uniform(
        shape=[1, 1, 1, 3],
        minval=-10000,
        maxval=10000,
        dtype=tf.float32,
        seed=2022)
input_img   = random_ops.random_uniform(
        shape=[1, 1, 1, 3],
        minval=-10000,
        maxval=10000,
        dtype=tf.float32,
        seed=2022)
output_img  = random_ops.random_uniform(
        shape=[1, 1, 1, 3],
        minval=-10000,
        maxval=10000,
        dtype=tf.float32,
        seed=2022)

with tf.device('/GPU:0'):
    out = tf.raw_ops.LRNGrad(input_grads=input_grads, input_image=input_img, output_image=output_img)
    #out = nn.lrn_grad(input_grads=input_grads, input_image=input_img, output_image=output_img)
    print(out)

with tf.device('/CPU:0'):
    out = tf.raw_ops.LRNGrad(input_grads=input_grads, input_image=input_img, output_image=output_img)
    #out = nn.lrn_grad(input_grads=input_grads, input_image=input_img, output_image=output_img)
    print(out)

相关日志输出


# python LRNgrad-test.py

2022-07-21 12:54:57.023583: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-07-21 12:54:57.132843: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-07-21 12:54:57.161994: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-07-21 12:55:00.101305: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-07-21 12:55:01.526010: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22298 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:9b:00.0, compute capability: 8.6
2022-07-21 12:55:01.527383: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 22298 MB memory:  -> device: 1, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:c8:00.0, compute capability: 8.6
2022-07-21 12:55:02.763201: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8100

tf.Tensor([[[[-0.29212222  0.97755533 -0.28474247]]]], shape=(1, 1, 1, 3), dtype=float32)

tf.Tensor([[[[2362.0498 1360.1172 2242.2402]]]], shape=(1, 1, 1, 3), dtype=float32)

tensorflow

来源：https://github.com/tensorflow/tensorflow/issues/56849

6条答案

按热度按时间

inkz8wg91#

@加达加什维尼，
我可以在tensorflow v2.8，v2.9和nightly上重现这个问题。请在这里找到它的要点。

赞(0）回复(0）举报 2022-10-29

h79rfbju2#

@enderdzz，我尝试在Tensorflow 2.9.1版本中重现您的问题，但我得到了不同的错误，请您查看要点here并进行必要的修改。谢谢！

赞(0）回复(0）举报 2022-10-29

zqdjd7g93#

您好，感谢您的回复：）
我在本地安装了2.9.1版本（cuda 11.2），运行同样的代码没有任何问题，仍然得到不一致的结果。我怀疑远程colab的GPU环境是AMD ROCm，所以会出现你遇到的问题，请参考：RadeonOpenCompute/ROCm#684的最大值
所以最好换到cuda环境来测试这段代码。

赞(0）回复(0）举报 2022-10-29

1l5u6lss4#

Colab GPU使用英伟达和Cuda。
下面是执行!nvidia-smi时将得到的结果
+-———————————————————————————————————————————————————————————————————————————-+
| NVIDIA ® SMI 460.32.03驱动程序版本：460.32.03 CUDA版本：11.2|
|- —————————————————————————————-+-————————————————————-+-————————————————————-+
| GPU名称持久性-M|业务标识显示A|易失性不一致ECC|
| 风扇温度性能功率：使用/容量|内存使用|GPU实用程序计算M.|
| | | 米格M.|
|===============================+======================+======================|
| 0特斯拉T4关闭|00000000：00：04.0关闭|第0页|
| 不适用55 C P0 29瓦/70瓦|464百万字节/15109百万字节|0%默认值|
| | | 不适用|
+-—————————————————————————————-+-————————————————————-+-————————————————————-+
+-———————————————————————————————————————————————————————————————————————————-+
| 流程：|
| GPU GI CI PID类型进程名称GPU内存|
| ID ID用法|
|=============================================================================|
+-———————————————————————————————————————————————————————————————————————————-+

赞(0）回复(0）举报 2022-10-29

dsf9zpds5#

好的。我现在不知道这个问题。
希望能得到相关TF开发者的帮助。

赞(0）回复(0）举报 2022-10-29

pqwbnv8z6#

如果我将output_img更改为

output_img = tf.nn.local_response_normalization(input_img)

我认为，如果您将无效值传递给output_image值，CPU和GPU渐变返回不同的结果是合理的。output_image必须是给定input_image的正确正向传递输出，如果您为output_image传递无效值，则操作没有明确定义的语义。
@rohan100jain，如果从正向传递中给出一个无效的输出，那么梯度操作在CPU和GPU上返回不同的结果是可以的，你同意吗？

赞(0）回复(0）举报 2022-10-29

我来回答

tensorflow CPU和GPU之间tf.raw_ops.LRNGrad的结果不一致

问题类型

来源

tensorflow 版本

自定义代码

操作系统平台和分发

移动的设备

Python版本

Bazel版本

GCC/编译器版本

CUDA/cuDNN版本

GPU型号和内存

当前行为？

重现问题的独立代码

相关日志输出

6条答案

相关问题

热门标签

最新问答