TPU Tensorflow mapping string label to int with

ddrv8njm  于 2个月前  发布在  其他
关注(0)|答案(3)|浏览(34)

问题类型

Bug

你是否在TF nightly版本中复现了这个bug?

问题来源

source

Tensorflow版本

2.12

自定义代码

OS平台和发行版

Colab

移动设备

  • 无响应*

Python版本

  • 无响应*

Bazel版本

  • 无响应*

GCC/编译器版本

  • 无响应*

CUDA/cuDNN版本

GPU型号和内存

当前行为?

我目前在尝试从tf.dataset获取批次时遇到错误。我正在将tfrecord中的字符串标签Map到int,使用tf.lookup.StaticHashTable。因为这个原因,我无法获取数据集的批次,并使用TPU训练模型。它在GPU上运行正常。

重现问题的独立代码

https://colab.research.google.com/drive/1vAADMl5fBulmSnbmbOTrMjyzAYAgHhFl?authuser=1#scrollTo=_zv9OlXbIqDf

相关日志输出

AttributeError                            Traceback (most recent call last)

/usr/local/lib/python3.9/dist-packages/tensorflow/python/data/ops/iterator_ops.py in _next_internal(self)
    786         # Fast path for the case `self._structure` is not a nested structure.
--> 787         return self._element_spec._from_compatible_tensor_list(ret)  # pylint: disable=protected-access
    788       except AttributeError:

AttributeError: 'tuple' object has no attribute '_from_compatible_tensor_list'

During handling of the above exception, another exception occurred:

InternalError                             Traceback (most recent call last)

13 frames

InternalError: failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:37044: Failed to connect to remote host: Connection refused
Additional GRPC error information from remote target /job:localhost/replica:0/task:0/device:CPU:0:
:UNKNOWN:failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:37044: Failed to connect to remote host: Connection refused {grpc_status:14, created_time:"2023-04-16T09:15:30.550805248+00:00"}
Executing non-communication op <MakeIterator> originally returned UnavailableError, and was replaced by InternalError to avoid invoking TF network error handling logic.

During handling of the above exception, another exception occurred:

InternalError                             Traceback (most recent call last)

/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/executor.py in wait(self)
     63   def wait(self):
     64     """Waits for ops dispatched in this executor to finish."""
---> 65     pywrap_tfe.TFE_ExecutorWaitForAllPendingNodes(self._handle)
     66 
     67   def clear_error(self):

InternalError: failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:37044: Failed to connect to remote host: Connection refused
Additional GRPC error information from remote target /job:localhost/replica:0/task:0/device:CPU:0:
:UNKNOWN:failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:37044: Failed to connect to remote host: Connection refused {grpc_status:14, created_time:"2023-04-16T09:15:30.550805248+00:00"}
Executing non-communication op <MakeIterator> originally returned UnavailableError, and was replaced by InternalError to avoid invoking TF network error handling logic.
f0brbegy

f0brbegy1#

我能够在使用tf2.12和TPU的colab上重现这个问题,但在GPU上它按预期工作。请参阅TPUGPU的摘要以供参考。
谢谢!

z4iuyo4d

z4iuyo4d2#

你好,Shiro-LK。
目前并非所有的Ops都可以在TPU上执行。iter操作可能是其中之一。此外,从错误日志中:
Executing non-communication op <MakeIterator> originally returned UnavailableError, and was replaced by InternalError to avoid invoking TF network error handling logic
似乎iter操作不支持在TPU上。
你可以参考TPU支持的Ops列表here。请仔细核对源代码并确认。如果你在这里发现任何遗漏,请回复我们。
谢谢!

4jb9z9bj

4jb9z9bj3#

谢谢,我想这个操作目前不兼容,但它没有在这个文档中提到。我成功地在不改变任何东西的情况下使其工作,但它可能每20次尝试只能工作一次。不确定这是如何可能的。@SuryanarayanaY

相关问题