tensorflow 无法在CPU上jit_compile并运行函数,请检查是否在TPU VM上,

14ifxucb  于 5个月前  发布在  其他
关注(0)|答案(9)|浏览(47)

问题类型

Bug

来源

二进制文件

Tensorflow版本

v2.11.0-0-gd5b57ca9 2.11.0

自定义代码

OS平台和发行版

Linux t1v-n-92ea8b2a-w-0 5.15.0-1022-gcp #29 ~20.04.1-Ubuntu SMP Sat Oct 29 18:17:56 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

移动设备

  • 无响应*

Python版本

3.8

Bazel版本

  • 无响应*

GCC/编译器版本

  • 无响应*

CUDA/cuDNN版本

  • 无响应*

GPU型号和内存

  • 无响应*

当前行为?

v3-8 TPU-VM上运行tensorflow版本:tpu-vm-tf-2.11.0,我无法在CPU上运行一个基本函数。请指导我如何在TPU VM的CPU上运行jit_compiled函数。

重现问题的独立代码

import os

os.environ["TPU_NAME"] = "local"
os.environ["TPU_LOAD_LIBRARY"] = "1"

import tensorflow as tf

tf.debugging.set_log_device_placement(True)

print("All devices: ", tf.config.list_logical_devices())

a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]])
b = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]])

@tf.function(jit_compile=True)
def jit_test(a, b):
    c = tf.matmul(a, b)
    return a + b + c

with tf.device(":/TPU:0"):
    print(jit_test(a, b))
    print("Success!")

with tf.device(":/CPU:0"):
    print(jit_test(a, b))  # This will fail
    print("Will crash prior to getting here")

相关日志输出

2022-12-19 10:13:18.533185: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-19 10:13:18.717772: I tensorflow/core/tpu/tpu_initializer_helper.cc:275] Libtpu path is: libtpu.so
D1219 10:13:18.874059745   34863 config.cc:113]              gRPC EXPERIMENT tcp_frame_size_tuning               OFF (default:OFF)
D1219 10:13:18.874080489   34863 config.cc:113]              gRPC EXPERIMENT tcp_read_chunks                     OFF (default:OFF)
D1219 10:13:18.874093652   34863 config.cc:113]              gRPC EXPERIMENT tcp_rcv_lowat                       OFF (default:OFF)
D1219 10:13:18.874100741   34863 config.cc:113]              gRPC EXPERIMENT peer_state_based_framing            OFF (default:OFF)
D1219 10:13:18.874107419   34863 config.cc:113]              gRPC EXPERIMENT flow_control_fixes                  OFF (default:OFF)
D1219 10:13:18.874114099   34863 config.cc:113]              gRPC EXPERIMENT memory_pressure_controller          OFF (default:OFF)
D1219 10:13:18.874121059   34863 config.cc:113]              gRPC EXPERIMENT periodic_resource_quota_reclamation ON  (default:ON)
D1219 10:13:18.874127645   34863 config.cc:113]              gRPC EXPERIMENT unconstrained_max_quota_buffer_size OFF (default:OFF)
D1219 10:13:18.874134219   34863 config.cc:113]              gRPC EXPERIMENT new_hpack_huffman_decoder           OFF (default:OFF)
D1219 10:13:18.874140862   34863 config.cc:113]              gRPC EXPERIMENT event_engine_client                 OFF (default:OFF)
D1219 10:13:18.874147728   34863 config.cc:113]              gRPC EXPERIMENT monitoring_experiment               ON  (default:ON)
D1219 10:13:18.874154168   34863 config.cc:113]              gRPC EXPERIMENT promise_based_client_call           OFF (default:OFF)
I1219 10:13:18.874398506   34863 ev_epoll1_linux.cc:121]     grpc epoll fd: 6
D1219 10:13:18.874414089   34863 ev_posix.cc:141]            Using polling engine: epoll1
D1219 10:13:18.874434253   34863 dns_resolver_ares.cc:824]   Using ares dns resolver
D1219 10:13:18.874733217   34863 lb_policy_registry.cc:45]   registering LB policy factory for "priority_experimental"
D1219 10:13:18.874748219   34863 lb_policy_registry.cc:45]   registering LB policy factory for "outlier_detection_experimental"
D1219 10:13:18.874756312   34863 lb_policy_registry.cc:45]   registering LB policy factory for "weighted_target_experimental"
D1219 10:13:18.874763493   34863 lb_policy_registry.cc:45]   registering LB policy factory for "pick_first"
D1219 10:13:18.874770671   34863 lb_policy_registry.cc:45]   registering LB policy factory for "round_robin"
D1219 10:13:18.874783165   34863 lb_policy_registry.cc:45]   registering LB policy factory for "ring_hash_experimental"
D1219 10:13:18.874810477   34863 lb_policy_registry.cc:45]   registering LB policy factory for "grpclb"
D1219 10:13:18.874843143   34863 lb_policy_registry.cc:45]   registering LB policy factory for "rls_experimental"
D1219 10:13:18.874864810   34863 lb_policy_registry.cc:45]   registering LB policy factory for "xds_cluster_manager_experimental"
D1219 10:13:18.874872835   34863 lb_policy_registry.cc:45]   registering LB policy factory for "xds_cluster_impl_experimental"
D1219 10:13:18.874880753   34863 lb_policy_registry.cc:45]   registering LB policy factory for "cds_experimental"
D1219 10:13:18.874888414   34863 lb_policy_registry.cc:45]   registering LB policy factory for "xds_cluster_resolver_experimental"
D1219 10:13:18.874895665   34863 certificate_provider_registry.cc:35] registering certificate provider factory for "file_watcher"
I1219 10:13:18.895383666   34863 socket_utils_common_posix.cc:336] TCP_USER_TIMEOUT is available. TCP_USER_TIMEOUT will be used thereafter
2022-12-19 10:13:18.913051: I tensorflow/core/tpu/tpu_initializer_helper.cc:225] GetTpuPjrtApi not found
2022-12-19 10:13:21.766915: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-19 10:13:26.260445: I tensorflow/compiler/xla/service/service.cc:173] XLA service 0x63c6900 initialized for platform TPU (this does not guarantee that XLA will be used). Devices:
2022-12-19 10:13:26.260485: I tensorflow/compiler/xla/service/service.cc:181]   StreamExecutor device (0): TPU, 2a886c8
2022-12-19 10:13:26.260499: I tensorflow/compiler/xla/service/service.cc:181]   StreamExecutor device (1): TPU, 2a886c8
2022-12-19 10:13:26.260511: I tensorflow/compiler/xla/service/service.cc:181]   StreamExecutor device (2): TPU, 2a886c8
2022-12-19 10:13:26.260524: I tensorflow/compiler/xla/service/service.cc:181]   StreamExecutor device (3): TPU, 2a886c8
2022-12-19 10:13:26.260536: I tensorflow/compiler/xla/service/service.cc:181]   StreamExecutor device (4): TPU, 2a886c8
2022-12-19 10:13:26.260549: I tensorflow/compiler/xla/service/service.cc:181]   StreamExecutor device (5): TPU, 2a886c8
2022-12-19 10:13:26.260561: I tensorflow/compiler/xla/service/service.cc:181]   StreamExecutor device (6): TPU, 2a886c8
2022-12-19 10:13:26.260573: I tensorflow/compiler/xla/service/service.cc:181]   StreamExecutor device (7): TPU, 2a886c8
All devices:  [LogicalDevice(name='/device:CPU:0', device_type='CPU'), LogicalDevice(name='/device:TPU_SYSTEM:0', device_type='TPU_SYSTEM'), LogicalDevice(name='/device:TPU:0', device_type='TPU'), LogicalDevice(name='/device:TPU:1', device_type='TPU'), LogicalDevice(name='/device:TPU:2', device_type='TPU'), LogicalDevice(name='/device:TPU:3', device_type='TPU'), LogicalDevice(name='/device:TPU:4', device_type='TPU'), LogicalDevice(name='/device:TPU:5', device_type='TPU'), LogicalDevice(name='/device:TPU:6', device_type='TPU'), LogicalDevice(name='/device:TPU:7', device_type='TPU')]
input: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0
2022-12-19 10:13:26.286675: I tensorflow/core/common_runtime/placer.cc:114] input: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0
_EagerConst: (_EagerConst): /job:localhost/replica:0/task:0/device:CPU:0
2022-12-19 10:13:26.286733: I tensorflow/core/common_runtime/placer.cc:114] _EagerConst: (_EagerConst): /job:localhost/replica:0/task:0/device:CPU:0
output_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:CPU:0
2022-12-19 10:13:26.286753: I tensorflow/core/common_runtime/placer.cc:114] output_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:CPU:0
2022-12-19 10:13:26.287893: I tensorflow/core/common_runtime/eager/execute.cc:1445] Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:CPU:0
2022-12-19 10:13:26.288240: I tensorflow/core/common_runtime/eager/execute.cc:1445] Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:CPU:0
2022-12-19 10:13:26.361102: I tensorflow/core/common_runtime/eager/execute.cc:1445] Executing op __inference_jit_test_11 in device /job:localhost/replica:0/task:0/device:TPU:0
2022-12-19 10:13:26.473898: I tensorflow/compiler/jit/xla_compilation_cache.cc:477] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
tf.Tensor(
[[ 32.  40.  48.]
 [ 74.  91. 108.]
 [116. 142. 168.]], shape=(3, 3), dtype=float32)
Success!
2022-12-19 10:13:26.477187: I tensorflow/core/common_runtime/eager/execute.cc:1445] Executing op __inference_jit_test_11 in device /job:localhost/replica:0/task:0/device:CPU:0
2022-12-19 10:13:26.478142: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:417 : NOT_FOUND: could not find registered transfer manager for platform Host -- check target linkage
Traceback (most recent call last):
  File "notebooks/tpu_vm_test.py", line 28, in <module>
    print(jit_test(a, b))  # This will fail
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.NotFoundError: could not find registered transfer manager for platform Host -- check target linkage [Op:__inference_jit_test_11]
D1219 10:13:26.848936447   34863 init.cc:190]                grpc_shutdown starts clean-up now
zvms9eto

zvms9eto1#

@sushreebarsa,
我能够在tensorflow v2.9、v2.11和nightly版本上复现这个问题。请查看gist here 的概述。

guz6ccqo

guz6ccqo2#

请提供关于这个的最新更新,谢谢!

23c0lvtd

23c0lvtd3#

@tilakrayal@sushreebarsa 请问这个问题可以看一下吗?如果您不是合适的受让人,我们是否可以请另一个贡献者来帮助解决这个问题。谢谢。

6tr1vspr

6tr1vspr4#

代码在GPU环境下运行正常,如附件gist所示,在Colab和带有GPU的虚拟机上也运行正常,如下截图所示。

@sachinprasadhs 请问您能否检查一下这个问题,因为我没有TPU环境来复现。

beq87vna

beq87vna5#

请尝试使用CentralStorageStrategy,这将使在使用tpustrategy时将变量放置在CPU上。这将创建一个CentralStorageStrategy示例,该示例将使用所有可见的GPU和CPU。更新副本上的变量将在应用于变量之前进行聚合。

ctzwtxfj

ctzwtxfj6#

这个问题已经被自动标记为过时,因为它没有最近的活动。如果没有进一步的活动发生,它将被关闭。谢谢。

cyvaqqii

cyvaqqii7#

关闭为陈旧状态。如果您想进一步处理此问题,请重新打开。

ycggw6v2

ycggw6v28#

你对你的问题的解决是否满意?

eufgjt7s

eufgjt7s9#

请重新打开。建议的解决方案对TPU pods不起作用。请尝试在pod上解决此问题(我无法为pod提供colab示例)。此问题应该与TPUStrategy一起工作。

相关问题