MacBook M1 Pro上的Tensorflow错误-未找到错误:图形执行错误

pdkcd3nj  于 2022-12-27  发布在  Mac
关注(0)|答案(1)|浏览(456)

我在MacBook Pro M1 Max Pro上安装了Tensorflow,首先使用Anaconda安装依赖项:

conda install -c apple tensorflow-deps

然后,我安装了专门针对M1架构的Tensorflow发行版,以及一个可与金属GPU配合使用的工具包:

pip install tensorflow-metal tensorflow-macos

然后,我用一些虚拟训练和验证数据编写一个非常简单的前馈架构,看看是否可以执行训练会话:

from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import layers
import numpy as np

model = Sequential([layers.Input((3, 1)),
                    layers.LSTM(64),
                    layers.Dense(32, activation='relu'),
                    layers.Dense(32, activation='relu'),
                    layers.Dense(1)])

model.compile(loss='mse',
              optimizer=Adam(learning_rate=0.001),
              metrics=['mean_absolute_error'])

X_train = np.random.rand(100,3)
y_train = np.random.rand(100)
X_val = np.random.rand(100,3)
y_val = np.random.rand(100)
model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=100)

当我执行这个命令时,我得到了以下错误:

File ~/test.py:20
     18 X_val = np.random.rand(100,3)
     19 y_val = np.random.rand(100)
---> 20 model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=100)

File ~/anaconda3/envs/cv/lib/python3.8/site-packages/keras/utils/traceback_utils.py:70, in filter_traceback.<locals>.error_handler(*args, **kwargs)
     67     filtered_tb = _process_traceback_frames(e.__traceback__)
     68     # To get the full stack trace, call:
     69     # `tf.debugging.disable_traceback_filtering()`
---> 70     raise e.with_traceback(filtered_tb) from None
     71 finally:
     72     del filtered_tb

File ~/anaconda3/envs/cv/lib/python3.8/site-packages/tensorflow/python/eager/execute.py:52, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     50 try:
     51   ctx.ensure_initialized()
---> 52   tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
     53                                       inputs, attrs, num_outputs)
     54 except core._NotOkStatusException as e:
     55   if name is not None:

NotFoundError: Graph execution error:

Detected at node 'StatefulPartitionedCall_7' defined at (most recent call last):
    File "/Users/rphan/anaconda3/envs/cv/bin/ipython", line 8, in <module>
      sys.exit(start_ipython())
    File "/Users/rphan/anaconda3/envs/cv/lib/python3.8/site-packages/IPython/__init__.py", line 123, in start_ipython
      return launch_new_instance(argv=argv, **kwargs)
    File "/Users/rphan/anaconda3/envs/cv/lib/python3.8/site-packages/traitlets/config/application.py", line 1041, in launch_instance
      app.start()
    File "/Users/rphan/anaconda3/envs/cv/lib/python3.8/site-packages/IPython/terminal/ipapp.py", line 318, in start
      self.shell.mainloop()
    File "/Users/rphan/anaconda3/envs/cv/lib/python3.8/site-packages/IPython/terminal/interactiveshell.py", line 685, in mainloop
      self.interact()
    File "/Users/rphan/anaconda3/envs/cv/lib/python3.8/site-packages/IPython/terminal/interactiveshell.py", line 678, in interact
      self.run_cell(code, store_history=True)
    File "/Users/rphan/anaconda3/envs/cv/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 2940, in run_cell
      result = self._run_cell(
    File "/Users/rphan/anaconda3/envs/cv/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 2995, in _run_cell
      return runner(coro)
    File "/Users/rphan/anaconda3/envs/cv/lib/python3.8/site-packages/IPython/core/async_helpers.py", line 129, in _pseudo_sync_runner
      coro.send(None)
    File "/Users/rphan/anaconda3/envs/cv/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3194, in run_cell_async
      has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
    File "/Users/rphan/anaconda3/envs/cv/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3373, in run_ast_nodes
      if await self.run_code(code, result, async_=asy):
    File "/Users/rphan/anaconda3/envs/cv/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3433, in run_code
      exec(code_obj, self.user_global_ns, self.user_ns)
    File "<ipython-input-2-0ed839f9b556>", line 1, in <module>
      get_ipython().run_line_magic('run', 'test.py')
    File "/Users/rphan/anaconda3/envs/cv/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 2364, in run_line_magic
      result = fn(*args, **kwargs)
    File "/Users/rphan/anaconda3/envs/cv/lib/python3.8/site-packages/IPython/core/magics/execution.py", line 829, in run
      run()
    File "/Users/rphan/anaconda3/envs/cv/lib/python3.8/site-packages/IPython/core/magics/execution.py", line 814, in run
      runner(filename, prog_ns, prog_ns,
    File "/Users/rphan/anaconda3/envs/cv/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 2797, in safe_execfile
      py3compat.execfile(
    File "/Users/rphan/anaconda3/envs/cv/lib/python3.8/site-packages/IPython/utils/py3compat.py", line 55, in execfile
      exec(compiler(f.read(), fname, "exec"), glob, loc)
    File "/Users/rphan/test.py", line 20, in <module>
      model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=100)
    File "/Users/rphan/anaconda3/envs/cv/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
      return fn(*args, **kwargs)
    File "/Users/rphan/anaconda3/envs/cv/lib/python3.8/site-packages/keras/engine/training.py", line 1650, in fit
      tmp_logs = self.train_function(iterator)
    File "/Users/rphan/anaconda3/envs/cv/lib/python3.8/site-packages/keras/engine/training.py", line 1249, in train_function
      return step_function(self, iterator)
    File "/Users/rphan/anaconda3/envs/cv/lib/python3.8/site-packages/keras/engine/training.py", line 1233, in step_function
      outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/Users/rphan/anaconda3/envs/cv/lib/python3.8/site-packages/keras/engine/training.py", line 1222, in run_step
      outputs = model.train_step(data)
    File "/Users/rphan/anaconda3/envs/cv/lib/python3.8/site-packages/keras/engine/training.py", line 1027, in train_step
      self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
    File "/Users/rphan/anaconda3/envs/cv/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize
      self.apply_gradients(grads_and_vars)
    File "/Users/rphan/anaconda3/envs/cv/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients
      return super().apply_gradients(grads_and_vars, name=name)
    File "/Users/rphan/anaconda3/envs/cv/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 634, in apply_gradients
      iteration = self._internal_apply_gradients(grads_and_vars)
    File "/Users/rphan/anaconda3/envs/cv/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1166, in _internal_apply_gradients
      return tf.__internal__.distribute.interim.maybe_merge_call(
    File "/Users/rphan/anaconda3/envs/cv/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1216, in _distributed_apply_gradients_fn
      distribution.extended.update(
    File "/Users/rphan/anaconda3/envs/cv/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1211, in apply_grad_to_update_var
      return self._update_step_xla(grad, var, id(self._var_key(var)))
Node: 'StatefulPartitionedCall_7'
could not find registered platform with id: 0x1056be9e0
     [[{{node StatefulPartitionedCall_7}}]] [Op:__inference_train_function_4146]

我不知道这些错误是什么意思。以前有人见过这些错误吗?这似乎是一个非常简单的网络,我似乎无法理解为什么训练没有执行。

z31licg0

z31licg01#

经过大量搜索,这是由于Anaconda与通过pip安装的Tensorflow版本之间存在依赖关系:

conda list | grep tensorflow                                                                                                                                                                           

tensorflow-deps           2.9.0                          0    apple
tensorflow-estimator      2.11.0                    pypi_0    pypi
tensorflow-macos          2.11.0                    pypi_0    pypi
tensorflow-metal          0.7.0                     pypi_0    pypi

我安装的Tensorflow版本与Tensorflow依赖项不匹配,因此出现错误。解决方案是降级到与依赖项相同的版本,并降级tensorflow-metal

pip install tensorflow-metal==0.5.0
pip install tensorflow-macos==2.9.0

这样做并运行示例代码后,训练就成功了。

相关问题