Introduction
LLVM is a set of compiler and toolchain technologies that provide a language-independent intermediate representation, optimization passes, and code generation backends. Originally developed at the University of Illinois, LLVM now underpins compilers for C, C++, Rust, Swift, Julia, and many other languages.
What LLVM Does
- Provides a typed intermediate representation (LLVM IR) for language-independent optimization
- Compiles and optimizes code for dozens of hardware targets including x86, ARM, and RISC-V
- Powers the Clang C/C++ compiler as well as Rust, Swift, and Kotlin/Native backends
- Includes tools for static analysis, sanitizers, profiling, and link-time optimization
- Supports JIT compilation for dynamic languages and runtime code generation
Architecture Overview
LLVM uses a three-phase design: frontends (like Clang) lower source code to LLVM IR, the middle-end runs target-independent optimization passes, and backends generate machine code for specific architectures. Each phase is a reusable library. LLVM IR is a strongly-typed SSA-based representation that can be serialized to bitcode for LTO and distributed builds.
Self-Hosting & Configuration
- Build from source with CMake:
cmake -G Ninja -DLLVM_ENABLE_PROJECTS="clang;lld" ../llvm - Install pre-built packages via apt, brew, or the official LLVM apt repository
- Enable specific backends with
-DLLVM_TARGETS_TO_BUILD="X86;AArch64" - Use
-DCMAKE_BUILD_TYPE=Releasefor production builds - Add sanitizer support with
-DLLVM_USE_SANITIZER=Address
Key Features
- Language-agnostic IR used by over 20 language frontends
- Over 100 optimization passes including auto-vectorization and polyhedral optimization
- Code generation for all major CPU architectures and GPU targets
- Built-in sanitizers (ASan, MSan, TSan, UBSan) for memory and concurrency bugs
- Stable C API for embedding LLVM in external tools and JIT engines
Comparison with Similar Tools
- GCC — mature alternative with broader legacy platform support; less modular architecture
- Cranelift — lightweight code generator for Wasmtime; narrower scope than full LLVM
- MSVC — Microsoft compiler for Windows; proprietary and platform-locked
- GraalVM — JVM-based polyglot runtime; targets managed languages rather than native compilation
FAQ
Q: What languages use LLVM as their backend? A: Clang (C/C++), Rust, Swift, Kotlin/Native, Julia, Zig, Flang (Fortran), and many more.
Q: Can LLVM be used as a JIT compiler? A: Yes. The ORC JIT library provides lazy compilation and linking for runtime code generation.
Q: How large is the LLVM codebase? A: The monorepo contains several million lines of C++ across LLVM core, Clang, LLD, LLDB, and other sub-projects.
Q: Is LLVM only for compiled languages? A: No. LLVM IR is used by interpreters and JITs for dynamic languages as well, including parts of Python (Numba) and JavaScript (some engines experiment with it).