What is LitServe?
LitServe is a high-performance AI model serving engine built on top of FastAPI. It adds batching, streaming, GPU management, and autoscaling — making it simple to deploy any AI model as a production API with minimal code.
Answer-Ready: LitServe is a fast AI model serving engine from Lightning AI that adds batching, streaming, GPU autoscaling, and multi-model support on top of FastAPI.
Core Features
1. Automatic Batching
Combine multiple requests into one GPU batch:
server = ls.LitServer(
MyAPI(),
max_batch_size=16,
batch_timeout=0.01, # Wait 10ms to fill batch
)2. Streaming Responses
class StreamAPI(ls.LitAPI):
def predict(self, x):
for token in self.model.generate(x):
yield token
server = ls.LitServer(StreamAPI(), stream=True)3. Multi-GPU & Autoscaling
# Use all available GPUs
server = ls.LitServer(MyAPI(), accelerator="gpu", devices="auto")
# Scale workers per device
server = ls.LitServer(MyAPI(), accelerator="gpu", workers_per_device=4)4. Multi-Model Endpoints
server = ls.LitServer(
{
"gpt": GPTApi(),
"bert": BERTApi(),
"whisper": WhisperApi(),
}
)
# POST /predict/gpt, /predict/bert, etc.5. OpenAI-Compatible API
server = ls.LitServer(
MyLLMApi(),
spec=ls.OpenAISpec(), # /v1/chat/completions compatible
)Performance
| Feature | FastAPI | LitServe |
|---|---|---|
| Batching | Manual | Built-in |
| Streaming | Manual | Built-in |
| GPU Mgmt | Manual | Automatic |
| Throughput | 1x | ~2x |
FAQ
Q: How does it compare to vLLM or TGI? A: vLLM/TGI are LLM-specific. LitServe serves any model (vision, audio, tabular) with a unified API.
Q: Can I use it with PyTorch/TensorFlow? A: Yes, framework-agnostic. Any Python model works.
Q: Production ready? A: Yes, built by Lightning AI. Includes health checks, metrics, and Docker support.