Esta página se muestra en inglés. Una traducción al español está en curso.
ConfigsJul 4, 2026·3 min de lectura

imbalanced-learn — Handle Imbalanced Datasets in Python

imbalanced-learn is a scikit-learn-compatible Python library providing over- and under-sampling techniques, ensemble methods, and pipeline utilities for learning from imbalanced datasets.

Listo para agents

Instalación lista para agent

Este activo puede instalarse después de elegir el runtime, revisar el plan y ejecutar el comando correspondiente.

Native · 98/100Política: permitir
Superficie agent
Cualquier agent MCP/CLI
Tipo
Skill
Instalación
Single
Confianza
Confianza: Established
Entrada
imbalanced-learn
Comando de instalación directa
npx -y tokrepo@latest install 1cf4ecef-7761-11f1-9bc6-00163e2b0d79 --target codex

Ejecutar después de confirmar el plan con dry-run.

Introduction

imbalanced-learn is a Python package that extends scikit-learn with resampling techniques designed for imbalanced classification problems. When one class vastly outnumbers the other (fraud detection, medical diagnosis, anomaly detection), standard classifiers tend to ignore the minority class. imbalanced-learn provides tools to rebalance the dataset before training.

What imbalanced-learn Does

  • Over-sampling methods: SMOTE, ADASYN, BorderlineSMOTE, and random oversampling
  • Under-sampling methods: Tomek links, edited nearest neighbours, random undersampling
  • Combination methods that chain over- and under-sampling
  • Ensemble classifiers: BalancedRandomForest, EasyEnsemble, BalancedBagging
  • Pipeline integration with scikit-learn for clean preprocessing workflows

Architecture Overview

The library mirrors scikit-learn's API with fit_resample() for samplers and standard fit/predict for ensemble methods. Samplers implement a base class with consistent interfaces for oversampling (generating synthetic minority examples), undersampling (removing majority examples), or combinations. All samplers work with NumPy arrays and pandas DataFrames and integrate into imblearn.pipeline.Pipeline for end-to-end workflows.

Self-Hosting & Configuration

  • Install via pip: pip install imbalanced-learn
  • Requires Python 3.8+, scikit-learn, NumPy, SciPy, and joblib
  • No external services or GPU needed
  • Configure sampling strategies via ratio parameters (auto, float, or dict)
  • Drop-in replacement for scikit-learn pipelines

Key Features

  • Full scikit-learn API compatibility with fit_resample pattern
  • Multiple SMOTE variants for different data distributions
  • Ensemble methods designed specifically for imbalanced learning
  • Works with NumPy arrays and pandas DataFrames
  • Extensive documentation with practical examples and benchmarks

Comparison with Similar Tools

  • scikit-learn — provides class_weight parameter but no resampling; imbalanced-learn adds dedicated samplers
  • SMOTE (standalone) — imbalanced-learn bundles SMOTE plus many variants and undersampling methods
  • PyOD — focuses on outlier detection; imbalanced-learn targets supervised classification
  • XGBoost scale_pos_weight — model-level fix; imbalanced-learn operates at the data level

FAQ

Q: When should I use oversampling vs undersampling? A: Oversampling (SMOTE) works well with small datasets where losing samples is costly. Undersampling is faster and suitable when you have abundant majority samples.

Q: Can I use imbalanced-learn in a scikit-learn pipeline? A: Yes. Use imblearn.pipeline.Pipeline instead of sklearn.pipeline.Pipeline. It handles the fit_resample step automatically.

Q: Does SMOTE work with categorical features? A: Use SMOTENC (for mixed numeric/categorical) or SMOTEN (for purely categorical). Standard SMOTE handles only numeric features.

Q: Does it work with multi-class problems? A: Yes. Most samplers support multi-class by resampling each class independently based on the strategy parameter.

Sources

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados