# LitServe — Fast AI Model Serving Engine > Serve AI models 2x faster than FastAPI with built-in batching, streaming, GPU autoscaling, and multi-model endpoints. From the Lightning AI team. ## Install Save the content below to `.claude/skills/` or append to your `CLAUDE.md`: ## Quick Use ```bash pip install litserve ``` ```python import litserve as ls class MyAPI(ls.LitAPI): def setup(self, device): self.model = load_model(device) def predict(self, x): return self.model(x) server = ls.LitServer(MyAPI(), accelerator="gpu") server.run(port=8000) ``` ## What is LitServe? LitServe is a high-performance AI model serving engine built on top of FastAPI. It adds batching, streaming, GPU management, and autoscaling — making it simple to deploy any AI model as a production API with minimal code. **Answer-Ready**: LitServe is a fast AI model serving engine from Lightning AI that adds batching, streaming, GPU autoscaling, and multi-model support on top of FastAPI. ## Core Features ### 1. Automatic Batching Combine multiple requests into one GPU batch: ```python server = ls.LitServer( MyAPI(), max_batch_size=16, batch_timeout=0.01, # Wait 10ms to fill batch ) ``` ### 2. Streaming Responses ```python class StreamAPI(ls.LitAPI): def predict(self, x): for token in self.model.generate(x): yield token server = ls.LitServer(StreamAPI(), stream=True) ``` ### 3. Multi-GPU & Autoscaling ```python # Use all available GPUs server = ls.LitServer(MyAPI(), accelerator="gpu", devices="auto") # Scale workers per device server = ls.LitServer(MyAPI(), accelerator="gpu", workers_per_device=4) ``` ### 4. Multi-Model Endpoints ```python server = ls.LitServer( { "gpt": GPTApi(), "bert": BERTApi(), "whisper": WhisperApi(), } ) # POST /predict/gpt, /predict/bert, etc. ``` ### 5. OpenAI-Compatible API ```python server = ls.LitServer( MyLLMApi(), spec=ls.OpenAISpec(), # /v1/chat/completions compatible ) ``` ## Performance | Feature | FastAPI | LitServe | |---------|---------|----------| | Batching | Manual | Built-in | | Streaming | Manual | Built-in | | GPU Mgmt | Manual | Automatic | | Throughput | 1x | ~2x | ## FAQ **Q: How does it compare to vLLM or TGI?** A: vLLM/TGI are LLM-specific. LitServe serves any model (vision, audio, tabular) with a unified API. **Q: Can I use it with PyTorch/TensorFlow?** A: Yes, framework-agnostic. Any Python model works. **Q: Production ready?** A: Yes, built by Lightning AI. Includes health checks, metrics, and Docker support. ## Source & Thanks - GitHub: [Lightning-AI/LitServe](https://github.com/Lightning-AI/LitServe) (3k+ stars) - Docs: [litserve.lightning.ai](https://litserve.lightning.ai) ## 快速使用 ```bash pip install litserve ``` 5 行代码将 AI 模型部署为生产级 API。 ## 什么是 LitServe? LitServe 是 Lightning AI 出品的高性能 AI 模型服务引擎,在 FastAPI 基础上增加批处理、流式输出、GPU 管理和自动扩缩。 **一句话总结**:LitServe 是快速 AI 模型服务引擎,内置批处理、流式输出、GPU 自动扩缩和多模型支持。 ## 核心功能 ### 1. 自动批处理 多请求合并为一次 GPU 推理。 ### 2. 流式响应 逐 token 流式输出。 ### 3. 多 GPU 自动扩缩 自动检测和分配 GPU 资源。 ### 4. 多模型端点 一个服务器部署多个模型。 ### 5. OpenAI 兼容 支持 /v1/chat/completions 标准接口。 ## 常见问题 **Q: 和 vLLM 比较?** A: vLLM 专注 LLM,LitServe 支持任何模型(视觉、音频、表格)。 **Q: 生产就绪?** A: 是,Lightning AI 出品,含健康检查和指标监控。 ## 来源与致谢 - GitHub: [Lightning-AI/LitServe](https://github.com/Lightning-AI/LitServe) (3k+ stars) --- Source: https://tokrepo.com/en/workflows/c9d3044a-8ff3-437e-92a4-9c09e4701b67 Author: Prompt Lab