CLI ToolsMay 11, 2026·2 min read

Olive — Optimize Models for Faster Inference

Olive automates model optimization via a CLI so teams can reduce latency and cost (e.g., quantization/ONNX paths) before serving models in apps or agents.

Agent ready

This asset can be read and installed directly by agents

TokRepo exposes a universal CLI command, install contract, metadata JSON, adapter-aware plan, and raw content links so agents can judge fit, risk, and next actions.

Stage only · 29/100Stage only
Agent surface
Any MCP/CLI agent
Kind
CLI Tool
Install
Single
Trust
Trust: Established
Entrypoint
README.md
Universal CLI install command
npx tokrepo install 46ee49fb-a2a1-4d36-af94-e6fb4b7fa220
Intro

Olive automates model optimization via a CLI so teams can reduce latency and cost (e.g., quantization/ONNX paths) before serving models in apps or agents.

  • Best for: Teams serving models who want a repeatable optimization pipeline (CLI-first, configable)
  • Works with: Python environments + Olive CLI; integrates with model download flows and hardware-specific optimization paths
  • Setup time: 30 minutes

Practical Notes

  • Setup time ~30 minutes (env + install + one optimize run)
  • Quantitative knob from README: --precision int4 is an explicit measurable target
  • GitHub stars + forks (verified): see Source & Thanks

In agent products, optimization is often the cheapest “quality win”: you can keep the same prompts and tools while reducing latency enough to make multi-step plans feasible.

Practical workflow:

  1. Define a target metric (latency, memory, cost) and hardware target.
  2. Run Olive optimizations from a config or scripted CLI invocation.
  3. Benchmark the optimized model in your actual agent loop (not only in an isolated benchmark).

Treat artifacts as build outputs: version them, and attach the exact command/config used so results are reproducible.

FAQ

Q: Is Olive only for ONNX? A: The README highlights ONNX-related paths, but the project is positioned as a general model optimization toolkit with configurable pipelines.

Q: How do I know optimization helped agents? A: Measure end-to-end agent latency and success rate with the optimized model in the loop.

Q: What should I version-control? A: Your Olive config/commands plus benchmark notes and artifact hashes/paths.

🙏

Source & Thanks

Source: https://github.com/microsoft/Olive > License: MIT > GitHub stars: 2,312 · forks: 295

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets