Esta página se muestra en inglés. Una traducción al español está en curso.
ScriptsApr 22, 2026·3 min de lectura

Albumentations — Fast Image Augmentation Library for ML Pipelines

Albumentations is a fast and flexible image augmentation library for machine learning that supports classification, segmentation, and detection tasks with a composable transform API.

Introduction

Albumentations provides a fast, composable API for image augmentation in computer vision pipelines. It wraps OpenCV operations in a declarative transform interface and supports pixel-level, spatial, and domain-specific transforms for classification, segmentation, object detection, and keypoint tasks.

What Albumentations Does

  • Offers 70+ augmentation transforms: geometric, color, blur, noise, weather, and more
  • Handles bounding boxes, segmentation masks, and keypoints alongside image transforms automatically
  • Composes transforms with Compose, OneOf, SomeOf, and ReplayCompose for reproducibility
  • Runs transforms on NumPy arrays using optimized OpenCV backends for speed
  • Integrates with PyTorch, TensorFlow, and other frameworks via simple dataset wrappers

Architecture Overview

Transforms inherit from ImageOnlyTransform or DualTransform (the latter also modifies masks/bboxes). Compose chains transforms and applies them sequentially, passing an augmented dictionary with keys like image, mask, bboxes, and keypoints. Bounding box formats (Pascal VOC, COCO, YOLO, Albumentations) are converted internally via BboxParams. Probabilities and parameter ranges are set per transform, giving fine-grained control.

Self-Hosting & Configuration

  • Install via pip: pip install albumentations
  • Define a pipeline: A.Compose([A.Resize(224, 224), A.Normalize(), A.pytorch.transforms.ToTensorV2()])
  • Pass bounding box format: A.Compose([...], bbox_params=A.BboxParams(format='coco'))
  • Save and load pipelines: A.save(transform, 'pipeline.json') and A.load('pipeline.json')
  • Use ReplayCompose to record which augmentations were applied for debugging

Key Features

  • Fastest pure-Python augmentation library due to OpenCV-optimized operations
  • Unified API for image, mask, bounding box, and keypoint augmentation in a single pass
  • Serialization support lets you version-control augmentation pipelines as JSON or YAML
  • Large community with 70+ transforms covering standard and creative augmentations
  • Battle-tested in Kaggle competitions and production vision pipelines

Comparison with Similar Tools

  • torchvision.transforms — PyTorch built-in but slower and lacks native bbox/mask support
  • Kornia — differentiable augmentations on GPU tensors; Albumentations works on NumPy/CPU
  • imgaug — similar scope but less actively maintained and generally slower
  • Augmentor — pipeline-based but narrower transform set
  • NVIDIA DALI — GPU-accelerated data loading and augmentation; heavier setup

FAQ

Q: How does Albumentations handle bounding boxes during spatial transforms? A: When you set bbox_params, spatial transforms (crop, rotate, flip) automatically adjust bounding box coordinates and clip or remove boxes that fall outside the image.

Q: Can I use Albumentations with TensorFlow/Keras? A: Yes. Apply transforms to NumPy arrays in your data generator or tf.data pipeline before converting to tensors.

Q: Why is Albumentations faster than torchvision transforms? A: It uses OpenCV for pixel operations and NumPy for spatial math, which are faster than PIL-based transforms used by torchvision.

Q: How do I add a custom transform? A: Subclass ImageOnlyTransform or DualTransform, implement apply() and optionally apply_to_mask(), and use it inside Compose.

Sources

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados