问题类型

Bug

你是否在TF nightly版本中复现了这个bug?

否

问题来源

source

Tensorflow版本

2.12

自定义代码

是

OS平台和发行版

Colab

移动设备

无响应*

Python版本

无响应*

Bazel版本

无响应*

GCC/编译器版本

无响应*

CUDA/cuDNN版本

无

GPU型号和内存

无

当前行为？

我目前在尝试从tf.dataset获取批次时遇到错误。我正在将tfrecord中的字符串标签Map到int,使用tf.lookup.StaticHashTable。因为这个原因，我无法获取数据集的批次，并使用TPU训练模型。它在GPU上运行正常。

重现问题的独立代码

https://colab.research.google.com/drive/1vAADMl5fBulmSnbmbOTrMjyzAYAgHhFl?authuser=1#scrollTo=_zv9OlXbIqDf

相关日志输出

AttributeError                            Traceback (most recent call last)

/usr/local/lib/python3.9/dist-packages/tensorflow/python/data/ops/iterator_ops.py in _next_internal(self)
    786         # Fast path for the case `self._structure` is not a nested structure.
--> 787         return self._element_spec._from_compatible_tensor_list(ret)  # pylint: disable=protected-access
    788       except AttributeError:

AttributeError: 'tuple' object has no attribute '_from_compatible_tensor_list'

During handling of the above exception, another exception occurred:

InternalError                             Traceback (most recent call last)

13 frames

InternalError: failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:37044: Failed to connect to remote host: Connection refused
Additional GRPC error information from remote target /job:localhost/replica:0/task:0/device:CPU:0:
:UNKNOWN:failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:37044: Failed to connect to remote host: Connection refused {grpc_status:14, created_time:"2023-04-16T09:15:30.550805248+00:00"}
Executing non-communication op <MakeIterator> originally returned UnavailableError, and was replaced by InternalError to avoid invoking TF network error handling logic.

During handling of the above exception, another exception occurred:

InternalError                             Traceback (most recent call last)

/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/executor.py in wait(self)
     63   def wait(self):
     64     """Waits for ops dispatched in this executor to finish."""
---> 65     pywrap_tfe.TFE_ExecutorWaitForAllPendingNodes(self._handle)
     66 
     67   def clear_error(self):

InternalError: failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:37044: Failed to connect to remote host: Connection refused
Additional GRPC error information from remote target /job:localhost/replica:0/task:0/device:CPU:0:
:UNKNOWN:failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:37044: Failed to connect to remote host: Connection refused {grpc_status:14, created_time:"2023-04-16T09:15:30.550805248+00:00"}
Executing non-communication op <MakeIterator> originally returned UnavailableError, and was replaced by InternalError to avoid invoking TF network error handling logic.

3条答案

按热度按时间

f0brbegy1#

我能够在使用tf2.12和TPU的colab上重现这个问题，但在GPU上它按预期工作。请参阅TPU和GPU的摘要以供参考。
谢谢！

赞(0）回复(0）举报 6个月前

z4iuyo4d2#

你好，Shiro-LK。
目前并非所有的Ops都可以在TPU上执行。iter操作可能是其中之一。此外，从错误日志中：
Executing non-communication op <MakeIterator> originally returned UnavailableError, and was replaced by InternalError to avoid invoking TF network error handling logic
似乎iter操作不支持在TPU上。
你可以参考TPU支持的Ops列表here。请仔细核对源代码并确认。如果你在这里发现任何遗漏，请回复我们。
谢谢！

4jb9z9bj3#

谢谢，我想这个操作目前不兼容，但它没有在这个文档中提到。我成功地在不改变任何东西的情况下使其工作，但它可能每20次尝试只能工作一次。不确定这是如何可能的。@SuryanarayanaY

TPU Tensorflow mapping string label to int with