# LitServe — Fast AI Model Serving Engine

> Serve AI models 2x faster than FastAPI with built-in batching, streaming, GPU autoscaling, and multi-model endpoints. From the Lightning AI team.

## Install

Save the content below to `.claude/skills/` or append to your `CLAUDE.md`:

## Quick Use

```bash
pip install litserve
```

```python
import litserve as ls

class MyAPI(ls.LitAPI):
    def setup(self, device):
        self.model = load_model(device)

    def predict(self, x):
        return self.model(x)

server = ls.LitServer(MyAPI(), accelerator="gpu")
server.run(port=8000)
```

## What is LitServe?

LitServe is a high-performance AI model serving engine built on top of FastAPI. It adds batching, streaming, GPU management, and autoscaling — making it simple to deploy any AI model as a production API with minimal code.

**Answer-Ready**: LitServe is a fast AI model serving engine from Lightning AI that adds batching, streaming, GPU autoscaling, and multi-model support on top of FastAPI.

## Core Features

### 1. Automatic Batching
Combine multiple requests into one GPU batch:

```python
server = ls.LitServer(
    MyAPI(),
    max_batch_size=16,
    batch_timeout=0.01,  # Wait 10ms to fill batch
)
```

### 2. Streaming Responses

```python
class StreamAPI(ls.LitAPI):
    def predict(self, x):
        for token in self.model.generate(x):
            yield token

server = ls.LitServer(StreamAPI(), stream=True)
```

### 3. Multi-GPU & Autoscaling

```python
# Use all available GPUs
server = ls.LitServer(MyAPI(), accelerator="gpu", devices="auto")

# Scale workers per device
server = ls.LitServer(MyAPI(), accelerator="gpu", workers_per_device=4)
```

### 4. Multi-Model Endpoints

```python
server = ls.LitServer(
    {
        "gpt": GPTApi(),
        "bert": BERTApi(),
        "whisper": WhisperApi(),
    }
)
# POST /predict/gpt, /predict/bert, etc.
```

### 5. OpenAI-Compatible API

```python
server = ls.LitServer(
    MyLLMApi(),
    spec=ls.OpenAISpec(),  # /v1/chat/completions compatible
)
```

## Performance

| Feature | FastAPI | LitServe |
|---------|---------|----------|
| Batching | Manual | Built-in |
| Streaming | Manual | Built-in |
| GPU Mgmt | Manual | Automatic |
| Throughput | 1x | ~2x |

## FAQ

**Q: How does it compare to vLLM or TGI?**
A: vLLM/TGI are LLM-specific. LitServe serves any model (vision, audio, tabular) with a unified API.

**Q: Can I use it with PyTorch/TensorFlow?**
A: Yes, framework-agnostic. Any Python model works.

**Q: Production ready?**
A: Yes, built by Lightning AI. Includes health checks, metrics, and Docker support.

## Source & Thanks

- GitHub: [Lightning-AI/LitServe](https://github.com/Lightning-AI/LitServe) (3k+ stars)
- Docs: [litserve.lightning.ai](https://litserve.lightning.ai)

<!-- ZH -->

## 快速使用

```bash
pip install litserve
```

5 行代码将 AI 模型部署为生产级 API。

## 什么是 LitServe？

LitServe 是 Lightning AI 出品的高性能 AI 模型服务引擎，在 FastAPI 基础上增加批处理、流式输出、GPU 管理和自动扩缩。

**一句话总结**：LitServe 是快速 AI 模型服务引擎，内置批处理、流式输出、GPU 自动扩缩和多模型支持。

## 核心功能

### 1. 自动批处理
多请求合并为一次 GPU 推理。

### 2. 流式响应
逐 token 流式输出。

### 3. 多 GPU 自动扩缩
自动检测和分配 GPU 资源。

### 4. 多模型端点
一个服务器部署多个模型。

### 5. OpenAI 兼容
支持 /v1/chat/completions 标准接口。

## 常见问题

**Q: 和 vLLM 比较？**
A: vLLM 专注 LLM，LitServe 支持任何模型（视觉、音频、表格）。

**Q: 生产就绪？**
A: 是，Lightning AI 出品，含健康检查和指标监控。

## 来源与致谢

- GitHub: [Lightning-AI/LitServe](https://github.com/Lightning-AI/LitServe) (3k+ stars)

---
Source: https://tokrepo.com/en/workflows/c9d3044a-8ff3-437e-92a4-9c09e4701b67
Author: Prompt Lab