# ncnn — High-Performance Neural Network Inference for Mobile

> ncnn is a high-performance neural network inference framework from Tencent, optimized for mobile and embedded devices with minimal dependencies.

## Install

Save the content below to `.claude/skills/` or append to your `CLAUDE.md`:

# ncnn — High-Performance Neural Network Inference for Mobile

## Quick Use
```bash
git clone https://github.com/Tencent/ncnn.git
cd ncnn && mkdir build && cd build
cmake .. && make -j$(nproc)
# Run the squeezenet example
cd ../examples && ../build/examples/squeezenet cat.jpg
```

## Introduction
ncnn is a high-performance neural network inference computing framework developed by Tencent and optimized specifically for mobile and embedded platforms. It has zero third-party dependencies and can be cross-compiled for Android, iOS, and various ARM-based boards, making it a go-to choice for on-device AI.

## What ncnn Does
- Runs neural network inference on mobile CPUs and GPUs with near-native speed
- Supports ARM NEON, Vulkan GPU, x86 SSE/AVX, and RISC-V vector acceleration
- Converts models from PyTorch, ONNX, TensorFlow, Caffe, and MXNet formats
- Provides quantization tools (INT8) to reduce model size and improve throughput
- Ships a minimal runtime under 1 MB for resource-constrained devices

## Architecture Overview
ncnn uses a layer-based computation graph where each operator is hand-optimized with platform-specific SIMD intrinsics. The Vulkan compute backend enables GPU inference on mobile devices without requiring CUDA. Memory is managed via a blob allocator that reuses scratch buffers across layers to minimize peak usage. The framework supports dynamic shapes and can fuse operations at load time for faster execution.

## Self-Hosting & Configuration
- Build from source with CMake; enable `-DNCNN_VULKAN=ON` for GPU support
- Cross-compile for Android using the NDK toolchain file shipped in the repo
- Use `ncnnoptimize` to strip unused layers and fuse operations in exported models
- Convert ONNX models via `onnx2ncnn`; calibrate INT8 quantization with a small dataset
- Integrate into iOS projects via CocoaPods or by adding the framework directly

## Key Features
- Zero external dependencies — pure C++ with optional Vulkan
- Vulkan-based GPU inference works on Android, iOS, macOS, Linux, and Windows
- INT8 and FP16 quantization with calibration tools included
- Extensive model zoo with ready-to-use detection, segmentation, and recognition examples
- Active community with over 23,000 GitHub stars and regular releases

## Comparison with Similar Tools
- **MNN** — Also targets mobile; MNN has a built-in expression API while ncnn focuses on raw inference speed
- **TensorFlow Lite** — Google's mobile runtime; heavier dependency footprint than ncnn
- **ONNX Runtime Mobile** — Broader model compatibility but larger binary size
- **OpenVINO** — Optimized for Intel hardware; ncnn targets ARM and Vulkan GPUs
- **MLC-LLM** — Focused on LLM deployment; ncnn handles general vision and NLP models

## FAQ
**Q: Which platforms does ncnn support?**
A: Android, iOS, macOS, Windows, Linux, and various embedded Linux boards with ARM or RISC-V CPUs.

**Q: Can ncnn run large language models?**
A: ncnn is optimized for vision and lightweight NLP models. For LLM inference on device, consider pairing it with a dedicated LLM runtime.

**Q: How do I convert a PyTorch model to ncnn format?**
A: Export from PyTorch to ONNX first, then use the `onnx2ncnn` tool included in the repo to produce `.param` and `.bin` files.

**Q: Is GPU inference supported on mobile?**
A: Yes, ncnn uses Vulkan for GPU compute on both Android and iOS devices that support it.

## Sources
- https://github.com/Tencent/ncnn
- https://ncnn.docsforge.com/

---
Source: https://tokrepo.com/en/workflows/ncnn-high-performance-neural-network-inference-mobile-dd63bec3
Author: Script Depot