Esta página se muestra en inglés. Una traducción al español está en curso.
CLI ToolsMay 11, 2026·2 min de lectura

Olive — Optimize Models for Faster Inference

Olive automates model optimization via a CLI so teams can reduce latency and cost (e.g., quantization/ONNX paths) before serving models in apps or agents.

Listo para agents

Este activo puede ser leído e instalado directamente por agents

TokRepo expone un comando CLI universal, contrato de instalación, metadata JSON, plan según adaptador y contenido raw para que los agents evalúen compatibilidad, riesgo y próximos pasos.

Stage only · 29/100Stage only
Superficie agent
Cualquier agent MCP/CLI
Tipo
CLI Tool
Instalación
Single
Confianza
Confianza: Established
Entrada
README.md
Comando CLI universal
npx tokrepo install 46ee49fb-a2a1-4d36-af94-e6fb4b7fa220
Introducción

Olive automates model optimization via a CLI so teams can reduce latency and cost (e.g., quantization/ONNX paths) before serving models in apps or agents.

  • Best for: Teams serving models who want a repeatable optimization pipeline (CLI-first, configable)
  • Works with: Python environments + Olive CLI; integrates with model download flows and hardware-specific optimization paths
  • Setup time: 30 minutes

Practical Notes

  • Setup time ~30 minutes (env + install + one optimize run)
  • Quantitative knob from README: --precision int4 is an explicit measurable target
  • GitHub stars + forks (verified): see Source & Thanks

In agent products, optimization is often the cheapest “quality win”: you can keep the same prompts and tools while reducing latency enough to make multi-step plans feasible.

Practical workflow:

  1. Define a target metric (latency, memory, cost) and hardware target.
  2. Run Olive optimizations from a config or scripted CLI invocation.
  3. Benchmark the optimized model in your actual agent loop (not only in an isolated benchmark).

Treat artifacts as build outputs: version them, and attach the exact command/config used so results are reproducible.

FAQ

Q: Is Olive only for ONNX? A: The README highlights ONNX-related paths, but the project is positioned as a general model optimization toolkit with configurable pipelines.

Q: How do I know optimization helped agents? A: Measure end-to-end agent latency and success rate with the optimized model in the loop.

Q: What should I version-control? A: Your Olive config/commands plus benchmark notes and artifact hashes/paths.

🙏

Fuente y agradecimientos

Source: https://github.com/microsoft/Olive > License: MIT > GitHub stars: 2,312 · forks: 295

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados