系统信息

自定义代码：无，使用上游基准
操作系统：Android 12
设备：谷歌Pixel 4a
TensorFlow版本：夜间发布基准构建(URL中未指定确切版本)

重现步骤

启用开发者选项和USB调试。
执行以下脚本——它将下载TF基准测试和一个模型。
请注意，我们有一个内部的YOLOv4模型，我们无法共享。我已经找到了一个现有的模型。然而，结果或多或少是相同的，所以我猜想可能是模型架构出了问题。此外——该模型在CPU/NPU上运行正常。
结果：

预期：没有错误消息。
实际：有很多错误消息——似乎是每个模型节点的错误。请查看下面的输出。

#!/bin/bash
set -eou pipefail

MODEL_FILE_NAME="model.tflite"

BENCHMARK_PATH="$(mktemp -d)"
BENCHMARK_FILE_NAME="tensorflow-benchmark"

DEVICE_PATH="/data/local/tmp"

echo ":: Fetch benchmark..."
curl \
  --location "https://storage.googleapis.com/tensorflow-nightly-public/prod/tensorflow/release/lite/tools/nightly/latest/android_aarch64_benchmark_model" \
  --output "${BENCHMARK_PATH}/${BENCHMARK_FILE_NAME}"

echo ":: Fetch model..."
curl \
  --location "https://github.com/theAIGuysCode/tensorflow-yolov4-tflite/raw/master/android/app/src/main/assets/yolov4-416-fp32.tflite" \
  --output "${MODEL_FILE_NAME}"

echo ":: Move benchmark to the device..."
adb push "${BENCHMARK_PATH}/${BENCHMARK_FILE_NAME}" "${DEVICE_PATH}"
adb shell chmod +x "${DEVICE_PATH}/${BENCHMARK_FILE_NAME}"

echo ":: Move model to the device..."
adb push "${MODEL_FILE_NAME}" "${DEVICE_PATH}"

echo ":: Run benchmark..."
adb shell taskset f0 "${DEVICE_PATH}/${BENCHMARK_FILE_NAME}" \
  --graph="${DEVICE_PATH}/${MODEL_FILE_NAME}" \
  --use_gpu=true

echo ":: Remove benchmark..."
adb shell rm "${DEVICE_PATH}/${BENCHMARK_FILE_NAME}"
rm -rf "${BENCHMARK_PATH}"

echo ":: Remove model..."
adb shell rm "${DEVICE_PATH}/${MODEL_FILE_NAME}"
rm -rf "${MODEL_FILE_NAME}"

:: Fetch benchmark...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 6029k  100 6029k    0     0  11.8M      0 --:--:-- --:--:-- --:--:-- 11.8M
:: Fetch model...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   196  100   196    0     0    336      0 --:--:-- --:--:-- --:--:--   335
100 23.1M  100 23.1M    0     0  8255k      0  0:00:02  0:00:02 --:--:-- 23.7M
:: Move benchmark to the device...
/var/folders/d8/zmkczjms4jxbtbw24wt7qzbw0000gp/T/tmp.Yg5JHGh1/tens...ark: 1 file pushed, 0 skipped. 99.8 MB/s (6174376 bytes in 0.059s)
:: Move model to the device...
model.tflite: 1 file pushed, 0 skipped. 36.6 MB/s (24279948 bytes in 0.632s)
:: Run benchmark...
STARTING!
Log parameter values verbosely: [0]
Graph: [/data/local/tmp/model.tflite]
Use gpu: [1]
Loaded model /data/local/tmp/model.tflite
INFO: Initialized TensorFlow Lite runtime.
GPU delegate created.
INFO: Created TensorFlow Lite delegate for GPU.
INFO: Replacing 144 node(s) with delegate (TfLiteGpuDelegateV2) node, yielding 1 partitions.
INFO: Initialized OpenCL-based API.
INFO: Created 1 GPU delegate kernels.
Explicitly applied GPU delegate, and the model graph will be completely executed by the delegate.
The input model file size (MB): 24.2799
Initialized session in 2394.17ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
ERROR: TfLiteGpuDelegate Invoke: Given object is not valid
ERROR: Node number 144 (TfLiteGpuDelegateV2) failed to invoke.
ERROR: TfLiteGpuDelegate Invoke: Given object is not valid
ERROR: Node number 144 (TfLiteGpuDelegateV2) failed to invoke.
ERROR: TfLiteGpuDelegate Invoke: Given object is not valid
ERROR: Node number 144 (TfLiteGpuDelegateV2) failed to invoke.
ERROR: TfLiteGpuDelegate Invoke: Given object is not valid
ERROR: Node number 144 (TfLiteGpuDelegateV2) failed to invoke.
ERROR: TfLiteGpuDelegate Invoke: Given object is not valid
ERROR: Node number 144 (TfLiteGpuDelegateV2) failed to invoke.
ERROR: TfLiteGpuDelegate Invoke: Given object is not valid
>>> THIS CONTINUES FOR A WHILE <<<
count=852 first=237 curr=175 min=17 max=381 avg=176.995 std=47

Benchmarking failed.