# Olive — Optimize Models for Faster Inference > Olive automates model optimization via a CLI so teams can reduce latency and cost (e.g., quantization/ONNX paths) before serving models in apps or agents. ## Install Copy the content below into your project: ## Quick Use 1. Install / run: ```bash python -m venv .venv && source .venv/bin/activate && pip install olive-ai transformers onnxruntime-genai ``` 2. Start / smoke test: ```bash olive optimize --help | head -n 20 ``` 3. Verify: - Run the README quickstart `olive optimize` example and confirm it produces an output directory with optimized artifacts. ## Intro Olive automates model optimization via a CLI so teams can reduce latency and cost (e.g., quantization/ONNX paths) before serving models in apps or agents. - **Best for:** Teams serving models who want a repeatable optimization pipeline (CLI-first, configable) - **Works with:** Python environments + Olive CLI; integrates with model download flows and hardware-specific optimization paths - **Setup time:** 30 minutes ## Practical Notes - Setup time ~30 minutes (env + install + one optimize run) - Quantitative knob from README: `--precision int4` is an explicit measurable target - GitHub stars + forks (verified): see Source & Thanks In agent products, optimization is often the cheapest “quality win”: you can keep the same prompts and tools while reducing latency enough to make multi-step plans feasible. Practical workflow: 1. Define a target metric (latency, memory, cost) and hardware target. 2. Run Olive optimizations from a config or scripted CLI invocation. 3. Benchmark the optimized model in your actual agent loop (not only in an isolated benchmark). Treat artifacts as build outputs: version them, and attach the exact command/config used so results are reproducible. ### FAQ **Q: Is Olive only for ONNX?** A: The README highlights ONNX-related paths, but the project is positioned as a general model optimization toolkit with configurable pipelines. **Q: How do I know optimization helped agents?** A: Measure end-to-end agent latency and success rate with the optimized model in the loop. **Q: What should I version-control?** A: Your Olive config/commands plus benchmark notes and artifact hashes/paths. ## Source & Thanks > Source: https://github.com/microsoft/Olive > License: MIT > GitHub stars: 2,312 · forks: 295 --- ## 快速使用 1. 安装 / 运行: ```bash python -m venv .venv && source .venv/bin/activate && pip install olive-ai transformers onnxruntime-genai ``` 2. 启动 / 冒烟测试: ```bash olive optimize --help | head -n 20 ``` 3. 验证: - 按 README Quickstart 跑一遍 `olive optimize` 示例;确认生成包含优化产物的输出目录。 ## 简介 Olive 是微软开源的模型优化 CLI:自动化量化、ONNX 与硬件相关优化路径,帮助团队在接入应用或 Agent 前降低延迟与推理成本,并用配置/脚本确保结果可复现、可对比,便于落地到流水线。 - **适合谁:** 需要把模型推理做“可复现优化流水线”的团队,偏好 CLI + 配置驱动 - **可搭配:** Python 环境 + Olive CLI;可与模型下载流程及硬件相关优化路径结合 - **准备时间:** 30 分钟 ## 实战建议 - 准备时间约 30 分钟(建环境 + 安装 + 跑一次 optimize) - README 提供可量化参数:例如 `--precision int4`(精度/速度/成本权衡) - GitHub stars / forks(已核验):见「来源与感谢」 在 Agent 产品里,模型优化往往是最便宜的“体验提升”:不改提示词与工具链,也能通过降延迟让多步规划更可用。 实用流程: 1. 明确目标指标(延迟/显存/成本)与目标硬件。 2. 用 Olive 的配置或 CLI 脚本跑优化流程。 3. 在真实的 agent loop 里做对比评测(不要只看孤立 benchmark)。 把优化产物当作构建输出:版本化并记录精确命令/配置,才能真正可复现。 ### FAQ **Olive 只做 ONNX 吗?** 答:README 强调了 ONNX 等路径,但整体定位是可配置的模型优化工具箱与流水线。 **怎么判断对 Agent 真有帮助?** 答:用优化后的模型跑端到端 agent 流程,对比延迟与成功率。 **哪些内容建议纳入版本管理?** 答:Olive 的配置/命令、基准测试记录,以及产物的路径/哈希。 ## 来源与感谢 > Source: https://github.com/microsoft/Olive > License: MIT > GitHub stars: 2,312 · forks: 295 --- Source: https://tokrepo.com/en/workflows/olive-optimize-models-for-faster-inference Author: AI Open Source