Esta página se muestra en inglés. Una traducción al español está en curso.
ScriptsMay 19, 2026·3 min de lectura

ONNX Runtime — Cross-Platform ML Inference Accelerator

ONNX Runtime is Microsoft's high-performance inference engine for machine learning models in the ONNX format. It supports CPU, GPU, and specialized hardware accelerators across Linux, Windows, macOS, iOS, Android, and the web browser.

Listo para agents

Este activo puede ser leído e instalado directamente por agents

TokRepo expone un comando CLI universal, contrato de instalación, metadata JSON, plan según adaptador y contenido raw para que los agents evalúen compatibilidad, riesgo y próximos pasos.

Native · 98/100Política: permitir
Superficie agent
Cualquier agent MCP/CLI
Tipo
Skill
Instalación
Single
Confianza
Confianza: Established
Entrada
Quick Use
Comando CLI universal
npx tokrepo install 59114755-537e-11f1-9bc6-00163e2b0d79

Introduction

ONNX Runtime (ORT) is a cross-platform inference and training accelerator compatible with models from PyTorch, TensorFlow, scikit-learn, and other frameworks exported to the ONNX format. It is used in production at Microsoft across Office, Azure, Bing, and Windows.

What ONNX Runtime Does

  • Loads and runs ONNX models with automatic graph optimizations
  • Supports hardware acceleration via execution providers (CUDA, TensorRT, DirectML, OpenVINO, CoreML, XNNPACK)
  • Provides APIs for Python, C/C++, C#, Java, JavaScript, Objective-C, and Swift
  • Enables quantization (INT8, INT4) and mixed-precision for faster inference
  • Includes ONNX Runtime GenAI for optimized LLM and generative model serving

Architecture Overview

ORT's core is a C++ inference engine that takes an ONNX graph, applies platform-aware graph optimizations (operator fusion, constant folding, layout transformation), and dispatches operators to the best available execution provider. Each EP (e.g., CUDAExecutionProvider, TensorrtExecutionProvider) registers optimized kernel implementations. The session object manages model loading, memory allocation, and thread pooling.

Self-Hosting & Configuration

  • Install CPU version: pip install onnxruntime; GPU version: pip install onnxruntime-gpu
  • Export models from PyTorch using torch.onnx.export() or from TensorFlow via tf2onnx
  • Configure execution providers by passing a provider list to InferenceSession
  • Tune thread count, memory arena, and graph optimization level via SessionOptions
  • Deploy on mobile using the ONNX Runtime Mobile package with reduced operator sets

Key Features

  • Broad hardware coverage: NVIDIA GPU, AMD GPU, Intel CPU/GPU, Apple Neural Engine, Qualcomm NPU
  • Graph optimizations reduce latency without any model changes
  • Quantization tools for INT8 and INT4 with calibration workflows
  • ONNX Runtime GenAI provides optimized pipelines for LLMs (Phi, Llama, Mistral)
  • WebAssembly and WebGPU backends enable in-browser ML inference

Comparison with Similar Tools

  • TensorRT — NVIDIA-specific with maximum GPU performance; ORT is cross-platform and supports TensorRT as a backend
  • OpenVINO — Intel-focused inference toolkit; ORT includes OpenVINO as an execution provider
  • llama.cpp — specialized for LLM inference on CPU; ORT covers broader ML model types
  • TFLite — Google's mobile inference runtime; ORT offers wider hardware EP coverage
  • Triton Inference Server — NVIDIA's model serving platform; ORT is the inference engine, not the serving layer

FAQ

Q: Which ML frameworks can export to ONNX? A: PyTorch, TensorFlow, scikit-learn, XGBoost, LightGBM, and many others have ONNX export support.

Q: Does ONNX Runtime support training? A: Yes. ORT includes training acceleration for PyTorch models using ORTModule, which applies graph optimizations during training.

Q: Can I run ONNX Runtime in a web browser? A: Yes. The onnxruntime-web package runs models in the browser via WebAssembly or WebGPU.

Q: How do I choose the right execution provider? A: Pass your preferred providers as a list; ORT will use the first available one and fall back automatically.

Sources

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados