//tensorflow/python/client:session_partial_run_test 不可靠

f8rj6qna  于 2个月前  发布在  Python
关注(0)|答案(2)|浏览(37)

问题类型

Bug

你是否在TF nightly版本中复现了这个bug?

是的

问题来源

source

Tensorflow版本

git HEAD

自定义代码

OS平台和发行版

Ubuntu 20.04

移动设备

Python版本

3.9.16

Bazel版本

5.3.0

GCC/编译器版本

10.2.1

CUDA/cuDNN版本

GPU型号和内存

当前行为?

单元测试报告显示为FLAKY或FAILED。
查看https://source.cloud.google.com/results/invocations/dea422ff-7e14-4fc1-b324-0129ecd7ffbc/log或者https://github.com/tensorflow/tensorflow/actions/runs/4731924097/jobs/8397430880#step:5:23224

重现问题的独立代码

docker exec tf bazel --bazelrc=/usertools/cpu.bazelrc test --config=rbe --config=pycpp --config=build_event_export

相关日志输出

======================================================================
ERROR: testPartialRunMissingPlaceholderFeedExceptionDist (__main__.PartialRunTest)
PartialRunTest.testPartialRunMissingPlaceholderFeedExceptionDist
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/b/f/w/bazel-out/k8-opt/bin/tensorflow/python/client/session_partial_run_test.runfiles/org_tensorflow/tensorflow/python/client/session.py", line 1379, in _do_call
    return fn(*args)
  File "/b/f/w/bazel-out/k8-opt/bin/tensorflow/python/client/session_partial_run_test.runfiles/org_tensorflow/tensorflow/python/client/session.py", line 1369, in _prun_fn
    return self._call_tf_sessionprun(handle, feed_dict, fetch_list)
  File "/b/f/w/bazel-out/k8-opt/bin/tensorflow/python/client/session_partial_run_test.runfiles/org_tensorflow/tensorflow/python/client/session.py", line 1460, in _call_tf_sessionprun
    return tf_session.TF_SessionPRun_wrapper(self._session, handle, feed_dict,
tensorflow.python.framework.errors_impl.InternalError: From /job:localhost/replica:0/task:0:
ValidateDevices called before initialization.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/b/f/w/bazel-out/k8-opt/bin/tensorflow/python/client/session_partial_run_test.runfiles/org_tensorflow/tensorflow/python/framework/test_util.py", line 1629, in decorated
    return f(self, *args, **kwargs)
  File "/b/f/w/bazel-out/k8-opt/bin/tensorflow/python/client/session_partial_run_test.runfiles/org_tensorflow/tensorflow/python/client/session_partial_run_test.py", line 269, in testPartialRunMissingPlaceholderFeedExceptionDist
    self.RunTestPartialRunMissingPlaceholderFeedException(
  File "/b/f/w/bazel-out/k8-opt/bin/tensorflow/python/client/session_partial_run_test.runfiles/org_tensorflow/tensorflow/python/client/session_partial_run_test.py", line 119, in RunTestPartialRunMissingPlaceholderFeedException
    sess.partial_run(handle, fetches[0])
  File "/b/f/w/bazel-out/k8-opt/bin/tensorflow/python/client/session_partial_run_test.runfiles/org_tensorflow/tensorflow/python/client/session.py", line 1026, in partial_run
    return self._run(handle, fetches, feed_dict, None, None)
  File "/b/f/w/bazel-out/k8-opt/bin/tensorflow/python/client/session_partial_run_test.runfiles/org_tensorflow/tensorflow/python/client/session.py", line 1192, in _run
    results = self._do_run(handle, final_targets, final_fetches,
  File "/b/f/w/bazel-out/k8-opt/bin/tensorflow/python/client/session_partial_run_test.runfiles/org_tensorflow/tensorflow/python/client/session.py", line 1375, in _do_run
    return self._do_call(_prun_fn, handle, feeds, fetches)
  File "/b/f/w/bazel-out/k8-opt/bin/tensorflow/python/client/session_partial_run_test.runfiles/org_tensorflow/tensorflow/python/client/session.py", line 1398, in _do_call
    raise type(e)(node_def, op, message)  # pylint: disable=no-value-for-parameter
tensorflow.python.framework.errors_impl.InternalError: Graph execution error:

From /job:localhost/replica:0/task:0:
ValidateDevices called before initialization.

----------------------------------------------------------------------
Ran 25 tests in 2.625s

FAILED (errors=1, skipped=1)
================================================================================
mlnl4t2r

mlnl4t2r1#

你好,@elfringham ,
感谢你的报告。我想了解这是否是Docker特定的问题?我们能否在Linux环境中复现它?如果是,请提供复制报告行为所需的命令(s)。谢谢!

ugmeyewa

ugmeyewa2#

对不起,但这种行为太不规律了,无法得出任何关于它是特定于Docker还是其他容器平台的结论。目前的数据显示,有两次失败,然后是12个通过的任务。

相关问题