Esta página se muestra en inglés. Una traducción al español está en curso.
ScriptsMar 31, 2026·2 min de lectura

SGLang — Fast LLM Serving with RadixAttention

SGLang is a high-performance serving framework for LLMs and multimodal models. 25.3K+ GitHub stars. RadixAttention prefix caching, speculative decoding, structured outputs. NVIDIA/AMD/Intel/TPU. Apach

Introducción

SGLang is a high-performance serving framework for large language models and multimodal models, delivering low-latency and high-throughput inference. With 25,300+ GitHub stars and Apache 2.0 license, SGLang features RadixAttention for efficient prefix caching, zero-overhead scheduling, prefill-decode disaggregation, speculative decoding, and structured output generation. It supports NVIDIA, AMD, Intel, Google TPU, and Ascend NPU hardware, with broad model compatibility including Llama, Qwen, DeepSeek, and diffusion models.

Best for: Teams deploying LLMs in production needing maximum throughput and lowest latency Works with: Claude Code, OpenAI Codex, Cursor, Gemini CLI, Windsurf Hardware: NVIDIA, AMD, Intel, Google TPU, Ascend NPU


Key Features

  • RadixAttention: Automatic prefix caching for repeated prompts
  • Zero-overhead scheduling: Minimal dispatch latency between requests
  • Speculative decoding: Faster generation with draft models
  • Structured outputs: JSON schema-constrained generation
  • Multi-hardware: NVIDIA, AMD, Intel, TPU, Ascend NPU
  • Expert parallelism: Efficient MoE model serving
  • OpenAI-compatible API: Drop-in replacement server

FAQ

Q: What is SGLang? A: SGLang is an LLM serving framework with 25.3K+ stars featuring RadixAttention prefix caching, speculative decoding, and multi-hardware support. OpenAI-compatible API. Apache 2.0.

Q: How do I install SGLang? A: Run pip install sglang[all]. Launch with python -m sglang.launch_server --model <model-name>.


🙏

Fuente y agradecimientos

Created by SGLang Project. Licensed under Apache 2.0. sgl-project/sglang — 25,300+ GitHub stars

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados