Cette page est affichée en anglais. Une traduction française est en cours.
ScriptsJul 2, 2026·3 min de lecture

ONNX Runtime — Cross-Platform ML Inference and Training Accelerator

High-performance inference engine for ONNX models across CPUs, GPUs, and edge devices with broad framework support.

Prêt pour agents

Installation agent prête

Cet actif peut être installé après choix du runtime, vérification du plan et exécution de la commande adaptée.

Native · 98/100Policy : autoriser
Surface agent
Tout agent MCP/CLI
Type
Skill
Installation
Single
Confiance
Confiance : Established
Point d'entrée
ONNX Runtime Overview
Commande d'installation directe
npx -y tokrepo@latest install b0098335-7657-11f1-9bc6-00163e2b0d79 --target codex

À exécuter après confirmation du plan en dry-run.

Introduction

ONNX Runtime is an open-source inference engine by Microsoft that accelerates machine learning model execution across a wide range of hardware. It supports models exported from PyTorch, TensorFlow, scikit-learn, and other frameworks via the ONNX (Open Neural Network Exchange) format.

What ONNX Runtime Does

  • Runs ONNX-format models on CPU, GPU, NPU, and edge devices with optimized performance
  • Provides execution providers for CUDA, TensorRT, DirectML, OpenVINO, CoreML, and more
  • Supports both inference and training workloads with the same runtime
  • Integrates with Python, C/C++, C#, Java, JavaScript, and Objective-C
  • Applies graph optimizations and operator fusion automatically at load time

Architecture Overview

ONNX Runtime loads an ONNX model graph and applies a series of graph transformations (constant folding, operator fusion, layout optimization) before dispatching operations to execution providers. Each provider targets specific hardware: the CUDA EP for NVIDIA GPUs, the TensorRT EP for further GPU optimization, the CoreML EP for Apple Silicon, etc. The runtime selects the best provider per node, enabling heterogeneous execution within a single model.

Self-Hosting & Configuration

  • Install via pip, conda, NuGet, Maven, or npm depending on your language
  • Select execution providers by passing them to InferenceSession: e.g., ['CUDAExecutionProvider', 'CPUExecutionProvider']
  • Use onnxruntime-gpu for NVIDIA GPU acceleration with CUDA 11.x or 12.x
  • Tune thread count with SessionOptions().intra_op_num_threads for CPU inference
  • Quantize models with ONNX Runtime's built-in quantization tools to reduce model size and latency

Key Features

  • Broad hardware coverage via 20+ execution providers across cloud, desktop, mobile, and IoT
  • Automatic graph optimizations reduce inference latency without manual tuning
  • ONNX format interoperability lets you train in any framework and deploy uniformly
  • Quantization support (INT8, FP16) for smaller models and faster inference on constrained devices
  • Production-grade stability used in Microsoft products including Office, Bing, and Azure

Comparison with Similar Tools

  • TensorRT — NVIDIA-only, deeper GPU optimization but no cross-platform portability
  • OpenVINO — Intel-focused inference; ONNX Runtime supports Intel via the OpenVINO EP
  • TFLite — Targets mobile/embedded for TensorFlow models; ONNX Runtime covers more frameworks
  • Triton Inference Server — Model serving platform; ONNX Runtime is the inference engine it can host
  • llama.cpp — Specialized for LLM inference; ONNX Runtime is a general-purpose ML runtime

FAQ

Q: Do I need to convert my PyTorch model to ONNX first? A: Yes. Use torch.onnx.export() to convert a PyTorch model to ONNX format before loading it in ONNX Runtime.

Q: Can ONNX Runtime run large language models? A: Yes. The ONNX Runtime GenAI library supports transformer-based LLM inference with KV-cache optimization and beam search.

Q: Does it support training or only inference? A: Both. The onnxruntime-training package supports fine-tuning and full training with optimized memory usage.

Q: What platforms does ONNX Runtime run on? A: Windows, Linux, macOS, Android, iOS, and various embedded systems. It ships as a single library with no external dependencies.

Sources

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires