问题类型
Bug
你是否在TF nightly版本中复现了这个bug?
否
问题来源
source
Tensorflow版本
2.12
自定义代码
是
OS平台和发行版
Colab
移动设备
- 无响应*
Python版本
- 无响应*
Bazel版本
- 无响应*
GCC/编译器版本
- 无响应*
CUDA/cuDNN版本
无
GPU型号和内存
无
当前行为?
我目前在尝试从tf.dataset获取批次时遇到错误。我正在将tfrecord中的字符串标签Map到int,使用tf.lookup.StaticHashTable。因为这个原因,我无法获取数据集的批次,并使用TPU训练模型。它在GPU上运行正常。
重现问题的独立代码
https://colab.research.google.com/drive/1vAADMl5fBulmSnbmbOTrMjyzAYAgHhFl?authuser=1#scrollTo=_zv9OlXbIqDf
相关日志输出
AttributeError Traceback (most recent call last)
/usr/local/lib/python3.9/dist-packages/tensorflow/python/data/ops/iterator_ops.py in _next_internal(self)
786 # Fast path for the case `self._structure` is not a nested structure.
--> 787 return self._element_spec._from_compatible_tensor_list(ret) # pylint: disable=protected-access
788 except AttributeError:
AttributeError: 'tuple' object has no attribute '_from_compatible_tensor_list'
During handling of the above exception, another exception occurred:
InternalError Traceback (most recent call last)
13 frames
InternalError: failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:37044: Failed to connect to remote host: Connection refused
Additional GRPC error information from remote target /job:localhost/replica:0/task:0/device:CPU:0:
:UNKNOWN:failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:37044: Failed to connect to remote host: Connection refused {grpc_status:14, created_time:"2023-04-16T09:15:30.550805248+00:00"}
Executing non-communication op <MakeIterator> originally returned UnavailableError, and was replaced by InternalError to avoid invoking TF network error handling logic.
During handling of the above exception, another exception occurred:
InternalError Traceback (most recent call last)
/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/executor.py in wait(self)
63 def wait(self):
64 """Waits for ops dispatched in this executor to finish."""
---> 65 pywrap_tfe.TFE_ExecutorWaitForAllPendingNodes(self._handle)
66
67 def clear_error(self):
InternalError: failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:37044: Failed to connect to remote host: Connection refused
Additional GRPC error information from remote target /job:localhost/replica:0/task:0/device:CPU:0:
:UNKNOWN:failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:37044: Failed to connect to remote host: Connection refused {grpc_status:14, created_time:"2023-04-16T09:15:30.550805248+00:00"}
Executing non-communication op <MakeIterator> originally returned UnavailableError, and was replaced by InternalError to avoid invoking TF network error handling logic.
3条答案
按热度按时间f0brbegy1#
我能够在使用tf2.12和TPU的colab上重现这个问题,但在GPU上它按预期工作。请参阅TPU和GPU的摘要以供参考。
谢谢!
z4iuyo4d2#
你好,Shiro-LK。
目前并非所有的Ops都可以在TPU上执行。iter操作可能是其中之一。此外,从错误日志中:
Executing non-communication op <MakeIterator> originally returned UnavailableError, and was replaced by InternalError to avoid invoking TF network error handling logic
似乎
iter
操作不支持在TPU上。你可以参考TPU支持的Ops列表here。请仔细核对源代码并确认。如果你在这里发现任何遗漏,请回复我们。
谢谢!
4jb9z9bj3#
谢谢,我想这个操作目前不兼容,但它没有在这个文档中提到。我成功地在不改变任何东西的情况下使其工作,但它可能每20次尝试只能工作一次。不确定这是如何可能的。@SuryanarayanaY