Esta página se muestra en inglés. Una traducción al español está en curso.
ConfigsApr 22, 2026·3 min de lectura

timm — Pretrained Vision Models and Layers for PyTorch

timm (PyTorch Image Models) is a collection of pretrained image classification models, layers, utilities, and training scripts maintained by Ross Wightman and hosted on Hugging Face.

Introduction

timm (PyTorch Image Models) is the go-to library for pretrained image classification backbones in the PyTorch ecosystem. It provides hundreds of model architectures with pretrained weights and a consistent API for creating, fine-tuning, and benchmarking vision models.

What timm Does

  • Supplies 700+ pretrained model architectures covering CNNs, Vision Transformers, and hybrids
  • Offers a single create_model() entry point that handles weight loading and head customization
  • Provides reusable layers (attention blocks, normalization, activation functions) as building blocks
  • Includes a training script (train.py) with modern augmentation and optimization defaults
  • Publishes model performance benchmarks and weight registries on Hugging Face Hub

Architecture Overview

Models are registered in a global registry keyed by name. create_model() looks up the constructor, optionally downloads pretrained weights, and replaces the classifier head to match the requested num_classes. Internally each model is a standard nn.Module. timm layers (PatchEmbed, Mlp, DropPath, etc.) are reused across architectures. A data subpackage handles augmentation pipelines (RandAugment, CutMix, Mixup) used during training.

Self-Hosting & Configuration

  • Install via pip: pip install timm (requires PyTorch)
  • All weights download automatically from Hugging Face Hub on first use
  • Customize the classifier head: timm.create_model('resnet50', num_classes=10)
  • Use timm.list_models('vit_*') to discover available architectures
  • Export to ONNX or TorchScript with standard PyTorch APIs

Key Features

  • Largest single-repo collection of vision model implementations for PyTorch
  • Consistent API across all architectures — swap backbones with one argument change
  • Regular updates with new state-of-the-art models (EfficientNet, ConvNeXt, SwinV2, EVA, etc.)
  • Built-in training recipe with competitive ImageNet accuracy out of the box
  • Integrated with Hugging Face Hub for easy weight sharing and versioning

Comparison with Similar Tools

  • torchvision.models — ships with PyTorch but covers far fewer architectures and updates less often
  • Hugging Face Transformers — broader scope (NLP, audio, vision) but timm has deeper vision-specific coverage
  • MMClassification (MMPretrain) — OpenMMLab alternative, config-driven rather than code-driven
  • CLIP — focuses on vision-language alignment, not pure classification backbones
  • Keras Applications — TensorFlow/Keras equivalent; timm is PyTorch-native

FAQ

Q: How do I fine-tune a timm model on a custom dataset? A: Call timm.create_model('efficientnet_b0', pretrained=True, num_classes=YOUR_NUM), freeze early layers if desired, and train with your own loop or the included training script.

Q: Can I use timm models for object detection or segmentation? A: Yes. Libraries like Detectron2, MMDetection, and YOLO often accept timm backbones via feature extraction mode (features_only=True).

Q: Are timm weights free to use commercially? A: Most weights use Apache-2.0 or similar permissive licenses, but check the individual model card on Hugging Face Hub.

Q: How does timm compare in speed to torchvision? A: For the same architecture the performance is essentially identical; timm just offers more choices and newer designs.

Sources

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados