NNAPI on android 11 fails with movenet fp16 and int8 tflite models

Please make sure that this is an issue related to performance of TensorFlow.
As per our
GitHub Policy,
we only address code/doc bugs, performance issues, feature requests and
build/installation issues on GitHub. tag:performance_template

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): aarch64 Android 11
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow installed from (source or binary):
  • TensorFlow version (use command below):
  • Python version:
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version:
  • GPU model and memory:

Describe the current behavior
I have issues while working with movenet tflite models on android 11 NNAPI 1.3
the movenet models used are sourced from tfhub:

  1. https://tfhub.dev/google/lite-model/movenet/singlepose/lightning/tflite/float16/4
  2. https://tfhub.dev/google/lite-model/movenet/singlepose/lightning/tflite/int8/4

the above two singlepose movenet lighting tflite models are float16 and INT8 respectively, and I was trying to perform benchmarking of the same using the prebuilt benchmark model for android_aarch64 sourced from tflite website:
https://storage.googleapis.com/tensorflow-nightly-public/prod/tensorflow/release/lite/tools/nightly/latest/android_aarch64_benchmark_model

I am easily able to benchmark the models on CPU and GPU, but when I try to run it on NNAPI, the benchmarking fails, which is interesting because even if the model is not supported by NNAPI delegate, it should fallback on CPU which is not happening. These models fail to execute on NNAPI CPU as well which is strange.

log for lite-model_movenet_singlepose_lightning_tflite_float16_4.tflite:

$ ./android_aarch64_benchmark_model –graph=lite-model_movenet_singlepose_lightning_tflite_float16_4.tflite –use_nnapi=1 –nnapi_accelerator_name=nnapi-reference
STARTING!
Log parameter values verbosely: [0]
Graph: [lite-model_movenet_singlepose_lightning_tflite_float16_4.tflite]
Use NNAPI: [1]
NNAPI accelerator name: [nnapi-reference]
NNAPI accelerators available: [nnapi-reference]
Loaded model lite-model_movenet_singlepose_lightning_tflite_float16_4.tflite
INFO: Initialized TensorFlow Lite runtime.
INFO: Created TensorFlow Lite delegate for NNAPI.
ERROR: NN API returned error ANEURALNETWORKS_BAD_DATA at line 992 while adding operation.
ERROR: Node number 303 (TfLiteNnapiDelegate) failed to prepare.
ERROR: Restored original execution plan after delegate application failure.
Failed to apply NNAPI delegate.
Benchmarking failed.

Node 303 is a GatherNd node, but I am not sure why it fails at this node because there are two more GatherNd nodes which come before node 303.

log for lite-model_movenet_singlepose_lightning_tflite_int8_4.tflite:

$ ./android_aarch64_benchmark_model –graph=lite-model_movenet_singlepose_lightning_tflite_int8_4.tflite –use_nnapi=1 –nnapi_accelerator_name=nnapi-reference
STARTING!
Log parameter values verbosely: [0]
Graph: [lite-model_movenet_singlepose_lightning_tflite_int8_4.tflite]
Use NNAPI: [1]
NNAPI accelerator name: [nnapi-reference]
NNAPI accelerators available: [nnapi-reference]
Loaded model lite-model_movenet_singlepose_lightning_tflite_int8_4.tflite
INFO: Initialized TensorFlow Lite runtime.
INFO: Created TensorFlow Lite delegate for NNAPI.
ERROR: NN API returned error ANEURALNETWORKS_BAD_DATA at line 992 while adding operation.
ERROR: Node number 162 (TfLiteNnapiDelegate) failed to prepare.
ERROR: Restored original execution plan after delegate application failure.
Failed to apply NNAPI delegate.
Benchmarking failed.

Node 162 for this model is a separable_conv2d/bias node.

the logcat files for both the operations are attached in logs section.
I get the same results if i remove the ‘–nnapi_accelerator_name=nnapi-reference’, or add ‘–nnapi_allow_fp16=true’ parameter, I still get the same benchmarking failed issue as above.

these movenet models work well with CPU and GPU:

  1. lite-model_movenet_singlepose_lightning_tflite_int8_4.tflite CPU 4 threads:

$ ./android_aarch64_benchmark_model –graph=lite-model_movenet_singlepose_lightning_tflite_int8_4.tflite –num_threads=4
STARTING!
Log parameter values verbosely: [0]
Num threads: [4]
Graph: [lite-model_movenet_singlepose_lightning_tflite_int8_4.tflite]
#threads used for CPU inference: [4]
Loaded model lite-model_movenet_singlepose_lightning_tflite_int8_4.tflite
INFO: Initialized TensorFlow Lite runtime.
The input model file size (MB): 2.89484
Initialized session in 4.574ms.

  1. lite-model_movenet_singlepose_lightning_tflite_int8_4.tflite GPU:

$ ./android_aarch64_benchmark_model –graph=lite-model_movenet_singlepose_lightning_tflite_int8_4.tflite –use_gpu=1
STARTING!
Log parameter values verbosely: [0]
Graph: [lite-model_movenet_singlepose_lightning_tflite_int8_4.tflite]
Use gpu: [1]
Loaded model lite-model_movenet_singlepose_lightning_tflite_int8_4.tflite
INFO: Initialized TensorFlow Lite runtime.
INFO: Created TensorFlow Lite delegate for GPU.
ERROR: Following operations are not supported by GPU delegate:
ARG_MAX: Operation is not supported.
CAST: Operation is not supported.
CONCATENATION: OP is supported, but tensor type isn’t matched!
FLOOR_DIV: Operation is not supported.
GATHER_ND: Operation is not supported.
MUL: OP is supported, but tensor type isn’t matched!
PACK: OP is supported, but tensor type isn’t matched!
RESHAPE: OP is supported, but tensor type isn’t matched!
SUB: OP is supported, but tensor type isn’t matched!
UNPACK: Operation is not supported.
100 operations will run on the GPU, and the remaining 57 operations will run on the CPU.
INFO: Initialized OpenCL-based API.
INFO: Created 1 GPU delegate kernels.
Explicitly applied GPU delegate, and the model graph will be partially executed by the delegate w/ 1 delegate kernels.
The input model file size (MB): 2.89484
Initialized session in 2957.12ms.

  1. lite-model_movenet_singlepose_lightning_tflite_float16_4.tflite CPU 4 threads:

$ ./android_aarch64_benchmark_model –graph=lite-model_movenet_singlepose_lightning_tflite_float16_4.tflite –num_threads=4
STARTING!
Log parameter values verbosely: [0]
Num threads: [4]
Graph: [lite-model_movenet_singlepose_lightning_tflite_float16_4.tflite]
#threads used for CPU inference: [4]
Loaded model lite-model_movenet_singlepose_lightning_tflite_float16_4.tflite
INFO: Initialized TensorFlow Lite runtime.
The input model file size (MB): 4.75851
Initialized session in 3.211ms.

  1. lite-model_movenet_singlepose_lightning_tflite_float16_4.tflite GPU:

$ ./android_aarch64_benchmark_model –graph=lite-model_movenet_singlepose_lightning_tflite_float16_4.tflite –use_gpu=1
STARTING!
Log parameter values verbosely: [0]
Graph: [lite-model_movenet_singlepose_lightning_tflite_float16_4.tflite]
Use gpu: [1]
Loaded model lite-model_movenet_singlepose_lightning_tflite_float16_4.tflite
INFO: Initialized TensorFlow Lite runtime.
INFO: Created TensorFlow Lite delegate for GPU.
ERROR: Following operations are not supported by GPU delegate:
ARG_MAX: Operation is not supported.
CAST: Operation is not supported.
CONCATENATION: OP is supported, but tensor type isn’t matched!
DEQUANTIZE:
FLOOR_DIV: Operation is not supported.
GATHER_ND: Operation is not supported.
MUL: OP is supported, but tensor type isn’t matched!
PACK: OP is supported, but tensor type isn’t matched!
RESHAPE: OP is supported, but tensor type isn’t matched!
SUB: OP is supported, but tensor type isn’t matched!
UNPACK: Operation is not supported.
245 operations will run on the GPU, and the remaining 52 operations will run on the CPU.
INFO: Initialized OpenCL-based API.
INFO: Created 1 GPU delegate kernels.
Explicitly applied GPU delegate, and the model graph will be partially executed by the delegate w/ 1 delegate kernels.
The input model file size (MB): 4.75851
Initialized session in 2013.4ms.

to make sure that NNAPI is working for other models, I used a mobilenetv2 fp16 and int8 model from tfhub

  1. mobilenetv2-coco_fp16 : https://tfhub.dev/sayakpaul/lite-model/mobilenetv2-coco/fp16/1
  2. mobilenetv2-coco_int8 : https://tfhub.dev/sayakpaul/lite-model/mobilenetv2-coco/int8/1
    and i face no issues running NNAPI CPU.

output for mobilenetv2-coco/fp16 for NNAPI CPU:

$ ./android_aarch64_benchmark_model –graph=lite-model_mobilenetv2-coco_fp16_1.tflite –use_nnapi=1 –nnapi_accelerator_name=nnapi-reference
STARTING!
Log parameter values verbosely: [0]
Graph: [lite-model_mobilenetv2-coco_fp16_1.tflite]
Use NNAPI: [1]
NNAPI accelerator name: [nnapi-reference]
NNAPI accelerators available: [nnapi-reference]
Loaded model lite-model_mobilenetv2-coco_fp16_1.tflite
INFO: Initialized TensorFlow Lite runtime.
INFO: Created TensorFlow Lite delegate for NNAPI.
Explicitly applied NNAPI delegate, and the model graph will be partially executed by the delegate w/ 11 delegate kernels.
The input model file size (MB): 4.2551
Initialized session in 99.601ms.

So there is something wrong in movenet models which is failing when using NNAPi instead of falling back onto CPU. One reason i can think of, after analysing the logfile is due to tensor type not matching for CONCATENATION op, but not sure.

Describe the expected behavior
The movenet models, even if not entirely supported on NNAPI should fallback on CPU. If fallback is disabled, but graph is forced through NNAPI CPU, it should still give similar results as it would have when running simply on CPU, but that is not observed.

Other info / logs Include any logs or source code that would be helpful to
diagnose the problem. If including tracebacks, please include the full
traceback. Large logs and files should be attached.
log files:
lite-model_movenet_singlepose_lightning_tflite_float16_4.tflite_android_nnapi_logcat.txt
lite-model_movenet_singlepose_lightning_tflite_int8_4.tflite_android_nnapi_logcat.txt

1 thought on “NNAPI on android 11 fails with movenet fp16 and int8 tflite models

  1. Hi @suhrid-s

    The 2.1.0 version was already more than 2 years old (it was released back in Jan 2020).
    Could you try to upgrade to the newest version and see if the problem still exists?

Comments are closed.