# LitServe — Fast AI Model Serving Engine > Serve AI models 2x faster than FastAPI with built-in batching, streaming, GPU autoscaling, and multi-model endpoints. From the Lightning AI team. ## Install Save the content below to `.claude/skills/` or append to your `CLAUDE.md`: ## Quick Use ```bash pip install litserve ``` ```python import litserve as ls class MyAPI(ls.LitAPI): def setup(self, device): self.model = load_model(device) def predict(self, x): return self.model(x) server = ls.LitServer(MyAPI(), accelerator="gpu") server.run(port=8000) ``` ## What is LitServe? LitServe is a high-performance AI model serving engine built on top of FastAPI. It adds batching, streaming, GPU management, and autoscaling — making it simple to deploy any AI model as a production API with minimal code. **Answer-Ready**: LitServe is a fast AI model serving engine from Lightning AI that adds batching, streaming, GPU autoscaling, and multi-model support on top of FastAPI. ## Core Features ### 1. Automatic Batching Combine multiple requests into one GPU batch: ```python server = ls.LitServer( MyAPI(), max_batch_size=16, batch_timeout=0.01, # Wait 10ms to fill batch ) ``` ### 2. Streaming Responses ```python class StreamAPI(ls.LitAPI): def predict(self, x): for token in self.model.generate(x): yield token server = ls.LitServer(StreamAPI(), stream=True) ``` ### 3. Multi-GPU & Autoscaling ```python # Use all available GPUs server = ls.LitServer(MyAPI(), accelerator="gpu", devices="auto") # Scale workers per device server = ls.LitServer(MyAPI(), accelerator="gpu", workers_per_device=4) ``` ### 4. Multi-Model Endpoints ```python server = ls.LitServer( { "gpt": GPTApi(), "bert": BERTApi(), "whisper": WhisperApi(), } ) # POST /predict/gpt, /predict/bert, etc. ``` ### 5. OpenAI-Compatible API ```python server = ls.LitServer( MyLLMApi(), spec=ls.OpenAISpec(), # /v1/chat/completions compatible ) ``` ## Performance | Feature | FastAPI | LitServe | |---------|---------|----------| | Batching | Manual | Built-in | | Streaming | Manual | Built-in | | GPU Mgmt | Manual | Automatic | | Throughput | 1x | ~2x | ## FAQ **Q: How does it compare to vLLM or TGI?** A: vLLM/TGI are LLM-specific. LitServe serves any model (vision, audio, tabular) with a unified API. **Q: Can I use it with PyTorch/TensorFlow?** A: Yes, framework-agnostic. Any Python model works. **Q: Production ready?** A: Yes, built by Lightning AI. Includes health checks, metrics, and Docker support. ## Source & Thanks - GitHub: [Lightning-AI/LitServe](https://github.com/Lightning-AI/LitServe) (3k+ stars) - Docs: [litserve.lightning.ai](https://litserve.lightning.ai) ## Quick Start ```bash pip install litserve ``` Deploy an AI model as a production-grade API in five lines of code. ## What is LitServe? LitServe is a high-performance AI model serving engine from Lightning AI. Built on top of FastAPI, it adds batching, streaming output, GPU management, and auto-scaling. **In one sentence**: LitServe is a fast AI model serving engine with built-in batching, streaming output, GPU auto-scaling, and multi-model support. ## Core Features ### 1. Automatic Batching Combines multiple requests into a single GPU inference pass. ### 2. Streaming Responses Token-by-token streaming output. ### 3. Multi-GPU Auto-Scaling Automatically detects and allocates GPU resources. ### 4. Multi-Model Endpoints Deploy multiple models on a single server. ### 5. OpenAI Compatible Supports the standard `/v1/chat/completions` interface. ## FAQ **Q: How does it compare to vLLM?** A: vLLM focuses on LLMs; LitServe supports any model (vision, audio, tabular). **Q: Production ready?** A: Yes — built by Lightning AI with health checks and metrics monitoring. ## Source & Thanks - GitHub: [Lightning-AI/LitServe](https://github.com/Lightning-AI/LitServe) (3k+ stars) --- Source: https://tokrepo.com/en/workflows/litserve-fast-ai-model-serving-engine-c9d3044a Author: Prompt Lab