# SGLang — Fast LLM Serving with RadixAttention > SGLang is a high-performance serving framework for LLMs and multimodal models. 25.3K+ GitHub stars. RadixAttention prefix caching, speculative decoding, structured outputs. NVIDIA/AMD/Intel/TPU. Apach ## Install Save as a script file and run: ## Quick Use ```bash # Install pip install sglang[all] # Launch server python -m sglang.launch_server --model meta-llama/Llama-3.1-8B-Instruct --port 30000 # Query (OpenAI-compatible) curl http://localhost:30000/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "meta-llama/Llama-3.1-8B-Instruct", "messages": [{"role": "user", "content": "Hello!"}] }' ``` --- ## Intro SGLang is a high-performance serving framework for large language models and multimodal models, delivering low-latency and high-throughput inference. With 25,300+ GitHub stars and Apache 2.0 license, SGLang features RadixAttention for efficient prefix caching, zero-overhead scheduling, prefill-decode disaggregation, speculative decoding, and structured output generation. It supports NVIDIA, AMD, Intel, Google TPU, and Ascend NPU hardware, with broad model compatibility including Llama, Qwen, DeepSeek, and diffusion models. **Best for**: Teams deploying LLMs in production needing maximum throughput and lowest latency **Works with**: Claude Code, OpenAI Codex, Cursor, Gemini CLI, Windsurf **Hardware**: NVIDIA, AMD, Intel, Google TPU, Ascend NPU --- ## Key Features - **RadixAttention**: Automatic prefix caching for repeated prompts - **Zero-overhead scheduling**: Minimal dispatch latency between requests - **Speculative decoding**: Faster generation with draft models - **Structured outputs**: JSON schema-constrained generation - **Multi-hardware**: NVIDIA, AMD, Intel, TPU, Ascend NPU - **Expert parallelism**: Efficient MoE model serving - **OpenAI-compatible API**: Drop-in replacement server --- ### FAQ **Q: What is SGLang?** A: SGLang is an LLM serving framework with 25.3K+ stars featuring RadixAttention prefix caching, speculative decoding, and multi-hardware support. OpenAI-compatible API. Apache 2.0. **Q: How do I install SGLang?** A: Run `pip install sglang[all]`. Launch with `python -m sglang.launch_server --model `. --- ## Source & Thanks > Created by [SGLang Project](https://github.com/sgl-project). Licensed under Apache 2.0. > [sgl-project/sglang](https://github.com/sgl-project/sglang) — 25,300+ GitHub stars --- Source: https://tokrepo.com/en/workflows/f758afa2-8fb4-4ba6-9b3d-738559d2a0b0 Author: Script Depot