ScriptsMay 3, 2026·3 min read

ncnn — High-Performance Neural Network Inference for Mobile

ncnn is a high-performance neural network inference framework from Tencent, optimized for mobile and embedded devices with minimal dependencies.

Introduction

ncnn is a high-performance neural network inference computing framework developed by Tencent and optimized specifically for mobile and embedded platforms. It has zero third-party dependencies and can be cross-compiled for Android, iOS, and various ARM-based boards, making it a go-to choice for on-device AI.

What ncnn Does

  • Runs neural network inference on mobile CPUs and GPUs with near-native speed
  • Supports ARM NEON, Vulkan GPU, x86 SSE/AVX, and RISC-V vector acceleration
  • Converts models from PyTorch, ONNX, TensorFlow, Caffe, and MXNet formats
  • Provides quantization tools (INT8) to reduce model size and improve throughput
  • Ships a minimal runtime under 1 MB for resource-constrained devices

Architecture Overview

ncnn uses a layer-based computation graph where each operator is hand-optimized with platform-specific SIMD intrinsics. The Vulkan compute backend enables GPU inference on mobile devices without requiring CUDA. Memory is managed via a blob allocator that reuses scratch buffers across layers to minimize peak usage. The framework supports dynamic shapes and can fuse operations at load time for faster execution.

Self-Hosting & Configuration

  • Build from source with CMake; enable -DNCNN_VULKAN=ON for GPU support
  • Cross-compile for Android using the NDK toolchain file shipped in the repo
  • Use ncnnoptimize to strip unused layers and fuse operations in exported models
  • Convert ONNX models via onnx2ncnn; calibrate INT8 quantization with a small dataset
  • Integrate into iOS projects via CocoaPods or by adding the framework directly

Key Features

  • Zero external dependencies — pure C++ with optional Vulkan
  • Vulkan-based GPU inference works on Android, iOS, macOS, Linux, and Windows
  • INT8 and FP16 quantization with calibration tools included
  • Extensive model zoo with ready-to-use detection, segmentation, and recognition examples
  • Active community with over 23,000 GitHub stars and regular releases

Comparison with Similar Tools

  • MNN — Also targets mobile; MNN has a built-in expression API while ncnn focuses on raw inference speed
  • TensorFlow Lite — Google's mobile runtime; heavier dependency footprint than ncnn
  • ONNX Runtime Mobile — Broader model compatibility but larger binary size
  • OpenVINO — Optimized for Intel hardware; ncnn targets ARM and Vulkan GPUs
  • MLC-LLM — Focused on LLM deployment; ncnn handles general vision and NLP models

FAQ

Q: Which platforms does ncnn support? A: Android, iOS, macOS, Windows, Linux, and various embedded Linux boards with ARM or RISC-V CPUs.

Q: Can ncnn run large language models? A: ncnn is optimized for vision and lightweight NLP models. For LLM inference on device, consider pairing it with a dedicated LLM runtime.

Q: How do I convert a PyTorch model to ncnn format? A: Export from PyTorch to ONNX first, then use the onnx2ncnn tool included in the repo to produce .param and .bin files.

Q: Is GPU inference supported on mobile? A: Yes, ncnn uses Vulkan for GPU compute on both Android and iOS devices that support it.

Sources

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets