Key Features
- RadixAttention: Automatic prefix caching for repeated prompts
- Zero-overhead scheduling: Minimal dispatch latency between requests
- Speculative decoding: Faster generation with draft models
- Structured outputs: JSON schema-constrained generation
- Multi-hardware: NVIDIA, AMD, Intel, TPU, Ascend NPU
- Expert parallelism: Efficient MoE model serving
- OpenAI-compatible API: Drop-in replacement server
FAQ
Q: What is SGLang? A: SGLang is an LLM serving framework with 25.3K+ stars featuring RadixAttention prefix caching, speculative decoding, and multi-hardware support. OpenAI-compatible API. Apache 2.0.
Q: How do I install SGLang?
A: Run pip install sglang[all]. Launch with python -m sglang.launch_server --model <model-name>.