当运行tensorflow.python.ops.gen_nn_ops.max_pool_grad_with_argmax时发生崩溃,

qkf9rpyu  于 2个月前  发布在  Python
关注(0)|答案(6)|浏览(110)

问题类型

Bug

你是否在TF nightly版本中复现了这个bug?

来源

source

Tensorflow版本

2.11.0

自定义代码

OS平台和发行版

  • 无响应*

移动设备

22.04

Python版本

3.9

Bazel版本

  • 无响应*

GCC/编译器版本

  • 无响应*

CUDA/cuDNN版本

Cuda编译工具,版本11.5,V11.5.119

GPU型号和内存大小

  • 无响应*

当前行为?

When .max_pool_grad_with_argmax is given negative integer tensor, it crashes.

重现问题的独立代码

import tensorflow as tf
import numpy as np
from tensorflow.python.ops import gen_nn_ops
try:
  try:
    with tf.device('/CPU'):
      arg_0_tensor = tf.constant(-105687333925307, shape=[2, 3, 3, 1], dtype=tf.float32,)
      arg_0 = tf.identity(arg_0_tensor)
      arg_1_tensor = tf.random.uniform([2, 2, 2, 1], dtype=tf.float32)
      arg_1 = tf.identity(arg_1_tensor)
      arg_2_tensor = tf.random.uniform([2, 2, 2, 1], minval=-256, maxval=257, dtype=tf.int64)
      arg_2 = tf.identity(arg_2_tensor)
      ksize_0 = 1
      ksize_1 = 2
      ksize_2 = 2
      ksize_3 = 1
      ksize = [ksize_0,ksize_1,ksize_2,ksize_3,]
      strides_0 = 1
      strides_1 = 1
      strides_2 = 1
      strides_3 = 1
      strides = [strides_0,strides_1,strides_2,strides_3,]
      padding = "VALID"
      include_batch_in_index = False
      out = gen_nn_ops.max_pool_grad_with_argmax(arg_0,arg_1,arg_2,ksize=ksize,strides=strides,padding=padding,include_batch_in_index=include_batch_in_index,)
  except Exception as e:
    print("Error:"+str(e))
  try:
    with tf.device('/GPU:0'):
      arg_0 = tf.identity(arg_0_tensor)
      arg_0 = tf.cast(arg_0, tf.float32)
      arg_1 = tf.identity(arg_1_tensor)
      arg_1 = tf.cast(arg_1, tf.float32)
      arg_2 = tf.identity(arg_2_tensor)
      arg_2 = tf.cast(arg_2, tf.int64)
      ksize = [ksize_0,ksize_1,ksize_2,ksize_3,]
      strides = [strides_0,strides_1,strides_2,strides_3,]
      gen_nn_ops.max_pool_grad_with_argmax(arg_0,arg_1,arg_2,ksize=ksize,strides=strides,padding=padding,include_batch_in_index=include_batch_in_index,)
  except Exception as e:
    print("Error:"+str(e))
except Exception as e:
  print("Error:"+str(e))

相关日志输出

2023-01-04 00:07:17.297598: F tensorflow/core/kernels/maxpooling_op.cc:1065] Check failed: grad_out_index >= output_start && grad_out_index < output_end Invalid output gradient index: 240, 0, 18
Aborted
</details>
rryofs0p

rryofs0p1#

你好@nimashiri!
感谢你在gen_nn_ops.max_pool_grad_with_argmax上报告了这个bug。
@SuryanarayanaY !
你能看一下这个问题吗?附件中的gist是2.10,版本号为2.11和nightly版本。
谢谢!

cl25kdpy

cl25kdpy2#

你好@nimashiri,

在CPU运行时和运行时日志中观察到的这种行为,首先是错误:E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:267] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected,然后是警告:F tensorflow/core/kernels/maxpooling_op.cc:1076] Check failed: grad_out_index >= output_start && grad_out_index < output_end Invalid output gradient index: 120, 0, 18

查看错误信息,它试图检查GPU,因此我尝试在GPU运行时使用相同的代码,在2.11vnightly中也没有观察到崩溃。

需要检查为什么这个操作仅支持GPU。你能分享你的想法吗?

dwbf0jvd

dwbf0jvd3#

你好@nimashiri,

在CPU运行时和运行时日志中观察到的上述行为,我首先观察到了一个错误:E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:267] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected ,然后是警告:F tensorflow/core/kernels/maxpooling_op.cc:1076] Check failed: grad_out_index >= output_start && grad_out_index < output_end Invalid output gradient index: 120, 0, 18 。查看错误后,它试图检查GPU,因此我尝试在GPU运行时使用相同的代码,在2.11vnightly中也没有观察到崩溃。

需要检查为什么这个操作仅支持GPU。你能分享你的想法吗?

遗憾的是,在这方面我没有任何想法。

5ssjco0h

5ssjco0h4#

这个:

import tensorflow as tf
import numpy as np
from tensorflow.python.ops import gen_nn_ops
try:
  try:
    with tf.device('/CPU'):
      arg_0_tensor = tf.random.uniform([2, 3, 3, 1], dtype=tf.float32)
      arg_0 = tf.identity(arg_0_tensor)
      arg_1_tensor = tf.random.uniform([2, 2, 2, 1], dtype=tf.float32)
      arg_1 = tf.identity(arg_1_tensor)
      arg_2_tensor = tf.random.uniform([2, 2, 2, 1], minval=-256, maxval=257, dtype=tf.int64)
      arg_2 = tf.identity(arg_2_tensor)
      ksize_0 = 1
      ksize_1 = 2
      ksize_2 = 2
      ksize_3 = 1
      ksize = [ksize_0,ksize_1,ksize_2,ksize_3,]
      strides_0 = 1
      strides_1 = 1
      strides_2 = 1
      strides_3 = 1
      strides = [strides_0,strides_1,strides_2,strides_3,]
      padding = "VALID"
      include_batch_in_index = False
      out = gen_nn_ops.max_pool_grad_with_argmax(arg_0,arg_1,arg_2,ksize=ksize,strides=strides,padding=padding,include_batch_in_index=include_batch_in_index,)
  except Exception as e:
    print("Error:"+str(e))
  try:
    with tf.device('/GPU:0'):
      arg_0 = tf.identity(arg_0_tensor)
      arg_0 = tf.cast(arg_0, tf.float32)
      arg_1 = tf.identity(arg_1_tensor)
      arg_1 = tf.cast(arg_1, tf.float32)
      arg_2 = tf.identity(arg_2_tensor)
      arg_2 = tf.cast(arg_2, tf.int64)
      ksize = [ksize_0,ksize_1,ksize_2,ksize_3,]
      strides = [strides_0,strides_1,strides_2,strides_3,]
      gen_nn_ops.max_pool_grad_with_argmax(arg_0,arg_1,arg_2,ksize=ksize,strides=strides,padding=padding,include_batch_in_index=include_batch_in_index,)
  except Exception as e:
    print("Error:"+str(e))
except Exception as e:
  print("Error:"+str(e))
tktrz96b

tktrz96b6#

在tf-nightly(2.15.0-dev20231003)中仍然存在一个问题。附上屏幕截图供参考。

相关问题