Esta página se muestra en inglés. Una traducción al español está en curso.
ScriptsJul 2, 2026·3 min de lectura

ONNX Runtime — Cross-Platform ML Inference and Training Accelerator

High-performance inference engine for ONNX models across CPUs, GPUs, and edge devices with broad framework support.

Listo para agents

Instalación lista para agent

Este activo puede instalarse después de elegir el runtime, revisar el plan y ejecutar el comando correspondiente.

Native · 98/100Política: permitir
Superficie agent
Cualquier agent MCP/CLI
Tipo
Skill
Instalación
Single
Confianza
Confianza: Established
Entrada
ONNX Runtime Overview
Comando de instalación directa
npx -y tokrepo@latest install b0098335-7657-11f1-9bc6-00163e2b0d79 --target codex

Ejecutar después de confirmar el plan con dry-run.

Introduction

ONNX Runtime is an open-source inference engine by Microsoft that accelerates machine learning model execution across a wide range of hardware. It supports models exported from PyTorch, TensorFlow, scikit-learn, and other frameworks via the ONNX (Open Neural Network Exchange) format.

What ONNX Runtime Does

  • Runs ONNX-format models on CPU, GPU, NPU, and edge devices with optimized performance
  • Provides execution providers for CUDA, TensorRT, DirectML, OpenVINO, CoreML, and more
  • Supports both inference and training workloads with the same runtime
  • Integrates with Python, C/C++, C#, Java, JavaScript, and Objective-C
  • Applies graph optimizations and operator fusion automatically at load time

Architecture Overview

ONNX Runtime loads an ONNX model graph and applies a series of graph transformations (constant folding, operator fusion, layout optimization) before dispatching operations to execution providers. Each provider targets specific hardware: the CUDA EP for NVIDIA GPUs, the TensorRT EP for further GPU optimization, the CoreML EP for Apple Silicon, etc. The runtime selects the best provider per node, enabling heterogeneous execution within a single model.

Self-Hosting & Configuration

  • Install via pip, conda, NuGet, Maven, or npm depending on your language
  • Select execution providers by passing them to InferenceSession: e.g., ['CUDAExecutionProvider', 'CPUExecutionProvider']
  • Use onnxruntime-gpu for NVIDIA GPU acceleration with CUDA 11.x or 12.x
  • Tune thread count with SessionOptions().intra_op_num_threads for CPU inference
  • Quantize models with ONNX Runtime's built-in quantization tools to reduce model size and latency

Key Features

  • Broad hardware coverage via 20+ execution providers across cloud, desktop, mobile, and IoT
  • Automatic graph optimizations reduce inference latency without manual tuning
  • ONNX format interoperability lets you train in any framework and deploy uniformly
  • Quantization support (INT8, FP16) for smaller models and faster inference on constrained devices
  • Production-grade stability used in Microsoft products including Office, Bing, and Azure

Comparison with Similar Tools

  • TensorRT — NVIDIA-only, deeper GPU optimization but no cross-platform portability
  • OpenVINO — Intel-focused inference; ONNX Runtime supports Intel via the OpenVINO EP
  • TFLite — Targets mobile/embedded for TensorFlow models; ONNX Runtime covers more frameworks
  • Triton Inference Server — Model serving platform; ONNX Runtime is the inference engine it can host
  • llama.cpp — Specialized for LLM inference; ONNX Runtime is a general-purpose ML runtime

FAQ

Q: Do I need to convert my PyTorch model to ONNX first? A: Yes. Use torch.onnx.export() to convert a PyTorch model to ONNX format before loading it in ONNX Runtime.

Q: Can ONNX Runtime run large language models? A: Yes. The ONNX Runtime GenAI library supports transformer-based LLM inference with KV-cache optimization and beam search.

Q: Does it support training or only inference? A: Both. The onnxruntime-training package supports fine-tuning and full training with optimized memory usage.

Q: What platforms does ONNX Runtime run on? A: Windows, Linux, macOS, Android, iOS, and various embedded systems. It ships as a single library with no external dependencies.

Sources

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados