Cette page est affichée en anglais. Une traduction française est en cours.
ConfigsApr 28, 2026·2 min de lecture

torchvision — Computer Vision Models, Datasets & Transforms for PyTorch

The official PyTorch computer vision package providing pretrained models, common datasets, and image transformation utilities for training and evaluation.

Introduction

torchvision is the official computer vision library in the PyTorch ecosystem. It ships production-ready model architectures, pretrained weights, dataset loaders, and image transforms so researchers and engineers can build vision pipelines without reimplementing common components.

What torchvision Does

  • Provides 50+ pretrained model architectures (ResNet, EfficientNet, ViT, Swin, DETR)
  • Includes dataset wrappers for ImageNet, COCO, VOC, CIFAR, and more
  • Offers composable image transforms (v2 API with joint image/target transforms)
  • Supplies utilities for bounding box, mask, and keypoint manipulation
  • Bundles efficient C++/CUDA operators for NMS, RoI pooling, and deformable convolutions

Architecture Overview

torchvision is organized into four main modules: models (pretrained architectures), datasets (download and load benchmarks), transforms (preprocessing pipelines), and ops (custom CUDA kernels). The new transforms v2 API operates on arbitrary data structures, applying consistent random transforms to images and their annotations simultaneously.

Self-Hosting & Configuration

  • Install alongside PyTorch with matching CUDA version
  • Use pip, conda, or build from source for custom CUDA support
  • Download pretrained weights on first use or cache them via TORCH_HOME
  • Combine with torchdata or torch.utils.data.DataLoader for batched loading
  • Configure transforms pipelines declaratively using transforms.Compose

Key Features

  • Multi-weight API allowing selection of specific pretrained checkpoints per model
  • Transforms v2 with support for bounding boxes, segmentation masks, and videos
  • Built-in ONNX export support for deployment
  • Video reading and decoding utilities via torchvision.io
  • Quantization-ready model variants for efficient inference

Comparison with Similar Tools

  • timm — Larger model zoo for image classification; torchvision covers detection and segmentation too
  • Albumentations — Richer augmentation library but not tightly integrated with PyTorch models
  • OpenCV — General-purpose vision library; torchvision is specifically for deep learning workflows
  • Keras Applications — TensorFlow ecosystem equivalent; fewer detection/segmentation models

FAQ

Q: How do I load a pretrained model? A: Use torchvision.models.resnet50(weights="IMAGENET1K_V2"). The multi-weight API lets you pick specific checkpoint versions.

Q: Can torchvision handle video data? A: Yes. torchvision.io provides video reading, and transforms v2 supports video tensor augmentation.

Q: What is transforms v2? A: The new transform API that jointly transforms images and their annotations (boxes, masks) with consistent random parameters.

Q: Does torchvision support object detection? A: Yes. It includes Faster R-CNN, RetinaNet, FCOS, SSD, and DETR with pretrained COCO weights.

Sources

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires