# SGLang — Fast LLM Serving with RadixAttention

> SGLang is a high-performance serving framework for LLMs and multimodal models. 25.3K+ GitHub stars. RadixAttention prefix caching, speculative decoding, structured outputs. NVIDIA/AMD/Intel/TPU. Apach

## Install

Save as a script file and run:

## Quick Use

```bash
# Install
pip install sglang[all]

# Launch server
python -m sglang.launch_server --model meta-llama/Llama-3.1-8B-Instruct --port 30000

# Query (OpenAI-compatible)
curl http://localhost:30000/v1/chat/completions -H "Content-Type: application/json" -d '{
  "model": "meta-llama/Llama-3.1-8B-Instruct",
  "messages": [{"role": "user", "content": "Hello!"}]
}'
```

---

## Intro

SGLang is a high-performance serving framework for large language models and multimodal models, delivering low-latency and high-throughput inference. With 25,300+ GitHub stars and Apache 2.0 license, SGLang features RadixAttention for efficient prefix caching, zero-overhead scheduling, prefill-decode disaggregation, speculative decoding, and structured output generation. It supports NVIDIA, AMD, Intel, Google TPU, and Ascend NPU hardware, with broad model compatibility including Llama, Qwen, DeepSeek, and diffusion models.

**Best for**: Teams deploying LLMs in production needing maximum throughput and lowest latency
**Works with**: Claude Code, OpenAI Codex, Cursor, Gemini CLI, Windsurf
**Hardware**: NVIDIA, AMD, Intel, Google TPU, Ascend NPU

---

## Key Features

- **RadixAttention**: Automatic prefix caching for repeated prompts
- **Zero-overhead scheduling**: Minimal dispatch latency between requests
- **Speculative decoding**: Faster generation with draft models
- **Structured outputs**: JSON schema-constrained generation
- **Multi-hardware**: NVIDIA, AMD, Intel, TPU, Ascend NPU
- **Expert parallelism**: Efficient MoE model serving
- **OpenAI-compatible API**: Drop-in replacement server

---

### FAQ

**Q: What is SGLang?**
A: SGLang is an LLM serving framework with 25.3K+ stars featuring RadixAttention prefix caching, speculative decoding, and multi-hardware support. OpenAI-compatible API. Apache 2.0.

**Q: How do I install SGLang?**
A: Run `pip install sglang[all]`. Launch with `python -m sglang.launch_server --model <model-name>`.

---

## Source & Thanks

> Created by [SGLang Project](https://github.com/sgl-project). Licensed under Apache 2.0.
> [sgl-project/sglang](https://github.com/sgl-project/sglang) — 25,300+ GitHub stars

---
Source: https://tokrepo.com/en/workflows/f758afa2-8fb4-4ba6-9b3d-738559d2a0b0
Author: Script Depot