How do I install ncnn — High-Performance Neural Network Inference for Mobile?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

ncnn — High-Performance Neural Network Inference for Mobile

Introduction

ncnn is a high-performance neural network inference computing framework developed by Tencent and optimized specifically for mobile and embedded platforms. It has zero third-party dependencies and can be cross-compiled for Android, iOS, and various ARM-based boards, making it a go-to choice for on-device AI.

What ncnn Does

Runs neural network inference on mobile CPUs and GPUs with near-native speed
Supports ARM NEON, Vulkan GPU, x86 SSE/AVX, and RISC-V vector acceleration
Converts models from PyTorch, ONNX, TensorFlow, Caffe, and MXNet formats
Provides quantization tools (INT8) to reduce model size and improve throughput
Ships a minimal runtime under 1 MB for resource-constrained devices

Architecture Overview

ncnn uses a layer-based computation graph where each operator is hand-optimized with platform-specific SIMD intrinsics. The Vulkan compute backend enables GPU inference on mobile devices without requiring CUDA. Memory is managed via a blob allocator that reuses scratch buffers across layers to minimize peak usage. The framework supports dynamic shapes and can fuse operations at load time for faster execution.

Self-Hosting & Configuration

Build from source with CMake; enable -DNCNN_VULKAN=ON for GPU support
Cross-compile for Android using the NDK toolchain file shipped in the repo
Use ncnnoptimize to strip unused layers and fuse operations in exported models
Convert ONNX models via onnx2ncnn; calibrate INT8 quantization with a small dataset
Integrate into iOS projects via CocoaPods or by adding the framework directly

Key Features

Zero external dependencies — pure C++ with optional Vulkan
Vulkan-based GPU inference works on Android, iOS, macOS, Linux, and Windows
INT8 and FP16 quantization with calibration tools included
Extensive model zoo with ready-to-use detection, segmentation, and recognition examples
Active community with over 23,000 GitHub stars and regular releases

Comparison with Similar Tools

MNN — Also targets mobile; MNN has a built-in expression API while ncnn focuses on raw inference speed
TensorFlow Lite — Google's mobile runtime; heavier dependency footprint than ncnn
ONNX Runtime Mobile — Broader model compatibility but larger binary size
OpenVINO — Optimized for Intel hardware; ncnn targets ARM and Vulkan GPUs
MLC-LLM — Focused on LLM deployment; ncnn handles general vision and NLP models

FAQ

Q: Which platforms does ncnn support? A: Android, iOS, macOS, Windows, Linux, and various embedded Linux boards with ARM or RISC-V CPUs.

Q: Can ncnn run large language models? A: ncnn is optimized for vision and lightweight NLP models. For LLM inference on device, consider pairing it with a dedicated LLM runtime.

Q: How do I convert a PyTorch model to ncnn format? A: Export from PyTorch to ONNX first, then use the onnx2ncnn tool included in the repo to produce .param and .bin files.

Q: Is GPU inference supported on mobile? A: Yes, ncnn uses Vulkan for GPU compute on both Android and iOS devices that support it.

ncnn — High-Performance Neural Network Inference for Mobile

Introduction

What ncnn Does

Architecture Overview

Self-Hosting & Configuration

Key Features

Comparison with Similar Tools

FAQ

Sources

Discussion

Related Assets

Oumi — Unified LLM Fine-Tuning and Evaluation

SkyPilot — Run AI Workloads on Any Cloud or Kubernetes

TensorZero — Open-Source LLMOps Platform in Rust