# Olive — Optimize Models for Faster Inference

> Olive automates model optimization via a CLI so teams can reduce latency and cost (e.g., quantization/ONNX paths) before serving models in apps or agents.

## Install

Copy the content below into your project:

## Quick Use

1. Install / run:
   ```bash
   python -m venv .venv && source .venv/bin/activate && pip install olive-ai transformers onnxruntime-genai
   ```
2. Start / smoke test:
   ```bash
   olive optimize --help | head -n 20
   ```
3. Verify:
   - Run the README quickstart `olive optimize` example and confirm it produces an output directory with optimized artifacts.

## Intro

Olive automates model optimization via a CLI so teams can reduce latency and cost (e.g., quantization/ONNX paths) before serving models in apps or agents.

- **Best for:** Teams serving models who want a repeatable optimization pipeline (CLI-first, configable)
- **Works with:** Python environments + Olive CLI; integrates with model download flows and hardware-specific optimization paths
- **Setup time:** 30 minutes

## Practical Notes

- Setup time ~30 minutes (env + install + one optimize run)
- Quantitative knob from README: `--precision int4` is an explicit measurable target
- GitHub stars + forks (verified): see Source & Thanks

In agent products, optimization is often the cheapest “quality win”: you can keep the same prompts and tools while reducing latency enough to make multi-step plans feasible.

Practical workflow:

1. Define a target metric (latency, memory, cost) and hardware target.
2. Run Olive optimizations from a config or scripted CLI invocation.
3. Benchmark the optimized model in your actual agent loop (not only in an isolated benchmark).

Treat artifacts as build outputs: version them, and attach the exact command/config used so results are reproducible.

### FAQ

**Q: Is Olive only for ONNX?**
A: The README highlights ONNX-related paths, but the project is positioned as a general model optimization toolkit with configurable pipelines.

**Q: How do I know optimization helped agents?**
A: Measure end-to-end agent latency and success rate with the optimized model in the loop.

**Q: What should I version-control?**
A: Your Olive config/commands plus benchmark notes and artifact hashes/paths.

## Source & Thanks

> Source: https://github.com/microsoft/Olive
> License: MIT
> GitHub stars: 2,312 · forks: 295

---

<!-- ZH -->

## 快速使用

1. 安装 / 运行：
   ```bash
   python -m venv .venv && source .venv/bin/activate && pip install olive-ai transformers onnxruntime-genai
   ```
2. 启动 / 冒烟测试：
   ```bash
   olive optimize --help | head -n 20
   ```
3. 验证：
   - 按 README Quickstart 跑一遍 `olive optimize` 示例；确认生成包含优化产物的输出目录。

## 简介

Olive 是微软开源的模型优化 CLI：自动化量化、ONNX 与硬件相关优化路径，帮助团队在接入应用或 Agent 前降低延迟与推理成本，并用配置/脚本确保结果可复现、可对比，便于落地到流水线。

- **适合谁：** 需要把模型推理做“可复现优化流水线”的团队，偏好 CLI + 配置驱动
- **可搭配：** Python 环境 + Olive CLI；可与模型下载流程及硬件相关优化路径结合
- **准备时间：** 30 分钟

## 实战建议

- 准备时间约 30 分钟（建环境 + 安装 + 跑一次 optimize）
- README 提供可量化参数：例如 `--precision int4`（精度/速度/成本权衡）
- GitHub stars / forks（已核验）：见「来源与感谢」

在 Agent 产品里，模型优化往往是最便宜的“体验提升”：不改提示词与工具链，也能通过降延迟让多步规划更可用。

实用流程：

1. 明确目标指标（延迟/显存/成本）与目标硬件。
2. 用 Olive 的配置或 CLI 脚本跑优化流程。
3. 在真实的 agent loop 里做对比评测（不要只看孤立 benchmark）。

把优化产物当作构建输出：版本化并记录精确命令/配置，才能真正可复现。

### FAQ

**Olive 只做 ONNX 吗？**
答：README 强调了 ONNX 等路径，但整体定位是可配置的模型优化工具箱与流水线。

**怎么判断对 Agent 真有帮助？**
答：用优化后的模型跑端到端 agent 流程，对比延迟与成功率。

**哪些内容建议纳入版本管理？**
答：Olive 的配置/命令、基准测试记录，以及产物的路径/哈希。

## 来源与感谢

> Source: https://github.com/microsoft/Olive
> License: MIT
> GitHub stars: 2,312 · forks: 295


---
Source: https://tokrepo.com/en/workflows/olive-optimize-models-for-faster-inference
Author: AI Open Source