SkillsApr 7, 2026·2 min read

LitServe — Fast AI Model Serving Engine

Serve AI models 2x faster than FastAPI with built-in batching, streaming, GPU autoscaling, and multi-model endpoints. From the Lightning AI team.

PR
Prompt Lab · Community
Quick Use

Use it first, then decide how deep to go

This block should tell both the user and the agent what to copy, install, and apply first.

pip install litserve
import litserve as ls

class MyAPI(ls.LitAPI):
    def setup(self, device):
        self.model = load_model(device)

    def predict(self, x):
        return self.model(x)

server = ls.LitServer(MyAPI(), accelerator="gpu")
server.run(port=8000)

What is LitServe?

LitServe is a high-performance AI model serving engine built on top of FastAPI. It adds batching, streaming, GPU management, and autoscaling — making it simple to deploy any AI model as a production API with minimal code.

Answer-Ready: LitServe is a fast AI model serving engine from Lightning AI that adds batching, streaming, GPU autoscaling, and multi-model support on top of FastAPI.

Core Features

1. Automatic Batching

Combine multiple requests into one GPU batch:

server = ls.LitServer(
    MyAPI(),
    max_batch_size=16,
    batch_timeout=0.01,  # Wait 10ms to fill batch
)

2. Streaming Responses

class StreamAPI(ls.LitAPI):
    def predict(self, x):
        for token in self.model.generate(x):
            yield token

server = ls.LitServer(StreamAPI(), stream=True)

3. Multi-GPU & Autoscaling

# Use all available GPUs
server = ls.LitServer(MyAPI(), accelerator="gpu", devices="auto")

# Scale workers per device
server = ls.LitServer(MyAPI(), accelerator="gpu", workers_per_device=4)

4. Multi-Model Endpoints

server = ls.LitServer(
    {
        "gpt": GPTApi(),
        "bert": BERTApi(),
        "whisper": WhisperApi(),
    }
)
# POST /predict/gpt, /predict/bert, etc.

5. OpenAI-Compatible API

server = ls.LitServer(
    MyLLMApi(),
    spec=ls.OpenAISpec(),  # /v1/chat/completions compatible
)

Performance

Feature FastAPI LitServe
Batching Manual Built-in
Streaming Manual Built-in
GPU Mgmt Manual Automatic
Throughput 1x ~2x

FAQ

Q: How does it compare to vLLM or TGI? A: vLLM/TGI are LLM-specific. LitServe serves any model (vision, audio, tabular) with a unified API.

Q: Can I use it with PyTorch/TensorFlow? A: Yes, framework-agnostic. Any Python model works.

Q: Production ready? A: Yes, built by Lightning AI. Includes health checks, metrics, and Docker support.

🙏

Source & Thanks

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets