python 内部错误:图形执行错误:Tensorflow Efficientdetd0

fnatzsnv  于 2023-06-28  发布在  Python
关注(0)|答案(1)|浏览(91)

我目前正在尝试使用TensorFlow Applications模块中的EfficientDet D0模型执行迁移学习。我的目标是在Food101数据集上训练这个模型,用于对象检测任务。但是,每当我尝试运行代码时,都会遇到错误。

Saving TensorBoard log files to: training_logs/efficientnetb0_101_classes_all_data_feature_extract/20230627-212648
Epoch 1/3
2023-06-27 21:26:49.281247: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:561] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-06-27 21:26:49.281917: W tensorflow/core/framework/op_kernel.cc:1839] OP_REQUIRES failed at xla_ops.cc:629 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-06-27 21:26:49.283370: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:561] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-06-27 21:26:49.283918: W tensorflow/core/framework/op_kernel.cc:1839] OP_REQUIRES failed at xla_ops.cc:629 : INTERNAL: libdevice not found at ./libdevice.10.bc
---------------------------------------------------------------------------
InternalError                             Traceback (most recent call last)
Cell In[29], line 5
      1 # Turn off all warnings except for errors
      2 # tf.get_logger().setLevel('ERROR')
      3 
      4 # Fit the model with callbacks
----> 5 history_101_food_classes_feature_extract = model.fit(train_data,epochs=3,steps_per_epoch=len(train_data),validation_data=test_data,validation_steps=int(0.15 * len(test_data)),
      6                                                      callbacks=[create_tensorboard_callback("training_logs", 
      7                                                                                             "efficientnetb0_101_classes_all_data_feature_extract"),
      8                                                                 model_checkpoint])

File ~/.local/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py:70, in filter_traceback.<locals>.error_handler(*args, **kwargs)
     67     filtered_tb = _process_traceback_frames(e.__traceback__)
     68     # To get the full stack trace, call:
     69     # `tf.debugging.disable_traceback_filtering()`
---> 70     raise e.with_traceback(filtered_tb) from None
     71 finally:
     72     del filtered_tb

File ~/.local/lib/python3.10/site-packages/tensorflow/python/eager/execute.py:53, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     51 try:
     52   ctx.ensure_initialized()
---> 53   tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
     54                                       inputs, attrs, num_outputs)
     55 except core._NotOkStatusException as e:
     56   if name is not None:

InternalError: Graph execution error:

Detected at node cond_1/Adam/StatefulPartitionedCall defined at (most recent call last):

但它运行良好的谷歌可乐,这是wierd。任何帮助将不胜感激。

ia2d9nvy

ia2d9nvy1#

感谢@Dev Bhuyan的输入,它确实有效。然而,添加更多的上下文信息。
TensorFlow中的语句tf.config.run_functions_early(True)启用了急切执行模式。
默认情况下,TensorFlow使用计算图执行模型,其中操作被添加到图中,然后在会话中执行。另一方面,急切执行允许立即执行操作并返回结果(在我的例子中,这解决了某些图实现错误)
参考:https://www.tensorflow.org/API_docs/python/tf/compat/v1/enable_eager_execution#:~:text= Eager%20execution%20provides%20an%20imperative,compat。

相关问题