# Cactus — Low-Latency AI Inference Engine for Mobile Devices > An open-source C library for running LLM inference on smartphones and wearables with optimized performance for ARM processors and edge hardware. ## Install Save as a script file and run: # Cactus — Low-Latency AI Inference Engine for Mobile Devices ## Quick Use ```bash git clone https://github.com/cactus-compute/cactus.git cd cactus make # For iOS/Android, use the platform-specific build targets ``` ## Introduction Cactus is an open-source inference engine designed specifically for running LLMs and speech models on mobile devices and wearables. Built in C with ARM optimizations, it delivers low-latency inference without requiring a cloud connection, making AI capabilities available offline on resource-constrained hardware. ## What Cactus Does - Runs quantized LLMs on iOS and Android devices - Provides speech recognition and text-to-speech on-device - Supports GGUF model format for efficient loading - Delivers sub-second inference latency on modern mobile processors - Offers native bindings for Swift, Kotlin, and React Native ## Architecture Overview Cactus is written in C to maximize portability and minimize overhead. It uses NEON SIMD instructions on ARM processors for matrix multiplication acceleration. The engine supports 4-bit and 8-bit quantized models to fit within mobile memory constraints. A thin platform abstraction layer provides native iOS and Android integration without sacrificing performance. ## Self-Hosting & Configuration - Build from source with make or CMake for your target platform - Use pre-built iOS and Android libraries from releases - Load GGUF-format models from local storage - Configure thread count and memory limits for your device - Integrate via C API, Swift bindings, or Kotlin bindings ## Key Features - Optimized for ARM processors with NEON SIMD acceleration - Supports LLM inference and Whisper-based speech recognition - Sub-100ms token generation on modern mobile chips - GGUF model format with 4-bit and 8-bit quantization - Native bindings for iOS (Swift), Android (Kotlin), and React Native ## Comparison with Similar Tools - **llama.cpp** — desktop-focused; Cactus is optimized for mobile ARM targets - **ExecuTorch** — PyTorch ecosystem; Cactus uses GGUF for simpler model deployment - **MLC-LLM** — broader scope; Cactus prioritizes minimal footprint on phones - **ONNX Runtime Mobile** — general ML; Cactus specializes in LLM and speech workloads ## FAQ **Q: What models can it run?** A: Any GGUF-format model, including Llama, Mistral, Phi, and Whisper variants. **Q: Does it need a GPU?** A: No, it runs on the CPU with ARM NEON optimizations. GPU acceleration is optional where available. **Q: What is the minimum device requirement?** A: It runs on devices with 2 GB+ RAM using small quantized models (1-3B parameters). **Q: Can I use it in a React Native app?** A: Yes, React Native bindings are provided for cross-platform mobile development. ## Sources - https://github.com/cactus-compute/cactus --- Source: https://tokrepo.com/en/workflows/asset-00209c52 Author: Script Depot