# ncnn — High-Performance Neural Network Inference for Mobile > ncnn is a high-performance neural network inference framework from Tencent, optimized for mobile and embedded devices with minimal dependencies. ## Install Save the content below to `.claude/skills/` or append to your `CLAUDE.md`: # ncnn — High-Performance Neural Network Inference for Mobile ## Quick Use ```bash git clone https://github.com/Tencent/ncnn.git cd ncnn && mkdir build && cd build cmake .. && make -j$(nproc) # Run the squeezenet example cd ../examples && ../build/examples/squeezenet cat.jpg ``` ## Introduction ncnn is a high-performance neural network inference computing framework developed by Tencent and optimized specifically for mobile and embedded platforms. It has zero third-party dependencies and can be cross-compiled for Android, iOS, and various ARM-based boards, making it a go-to choice for on-device AI. ## What ncnn Does - Runs neural network inference on mobile CPUs and GPUs with near-native speed - Supports ARM NEON, Vulkan GPU, x86 SSE/AVX, and RISC-V vector acceleration - Converts models from PyTorch, ONNX, TensorFlow, Caffe, and MXNet formats - Provides quantization tools (INT8) to reduce model size and improve throughput - Ships a minimal runtime under 1 MB for resource-constrained devices ## Architecture Overview ncnn uses a layer-based computation graph where each operator is hand-optimized with platform-specific SIMD intrinsics. The Vulkan compute backend enables GPU inference on mobile devices without requiring CUDA. Memory is managed via a blob allocator that reuses scratch buffers across layers to minimize peak usage. The framework supports dynamic shapes and can fuse operations at load time for faster execution. ## Self-Hosting & Configuration - Build from source with CMake; enable `-DNCNN_VULKAN=ON` for GPU support - Cross-compile for Android using the NDK toolchain file shipped in the repo - Use `ncnnoptimize` to strip unused layers and fuse operations in exported models - Convert ONNX models via `onnx2ncnn`; calibrate INT8 quantization with a small dataset - Integrate into iOS projects via CocoaPods or by adding the framework directly ## Key Features - Zero external dependencies — pure C++ with optional Vulkan - Vulkan-based GPU inference works on Android, iOS, macOS, Linux, and Windows - INT8 and FP16 quantization with calibration tools included - Extensive model zoo with ready-to-use detection, segmentation, and recognition examples - Active community with over 23,000 GitHub stars and regular releases ## Comparison with Similar Tools - **MNN** — Also targets mobile; MNN has a built-in expression API while ncnn focuses on raw inference speed - **TensorFlow Lite** — Google's mobile runtime; heavier dependency footprint than ncnn - **ONNX Runtime Mobile** — Broader model compatibility but larger binary size - **OpenVINO** — Optimized for Intel hardware; ncnn targets ARM and Vulkan GPUs - **MLC-LLM** — Focused on LLM deployment; ncnn handles general vision and NLP models ## FAQ **Q: Which platforms does ncnn support?** A: Android, iOS, macOS, Windows, Linux, and various embedded Linux boards with ARM or RISC-V CPUs. **Q: Can ncnn run large language models?** A: ncnn is optimized for vision and lightweight NLP models. For LLM inference on device, consider pairing it with a dedicated LLM runtime. **Q: How do I convert a PyTorch model to ncnn format?** A: Export from PyTorch to ONNX first, then use the `onnx2ncnn` tool included in the repo to produce `.param` and `.bin` files. **Q: Is GPU inference supported on mobile?** A: Yes, ncnn uses Vulkan for GPU compute on both Android and iOS devices that support it. ## Sources - https://github.com/Tencent/ncnn - https://ncnn.docsforge.com/ --- Source: https://tokrepo.com/en/workflows/ncnn-high-performance-neural-network-inference-mobile-dd63bec3 Author: Script Depot