Cette page est affichée en anglais. Une traduction française est en cours.
ConfigsJul 4, 2026·3 min de lecture

imbalanced-learn — Handle Imbalanced Datasets in Python

imbalanced-learn is a scikit-learn-compatible Python library providing over- and under-sampling techniques, ensemble methods, and pipeline utilities for learning from imbalanced datasets.

Prêt pour agents

Installation agent prête

Cet actif peut être installé après choix du runtime, vérification du plan et exécution de la commande adaptée.

Native · 98/100Policy : autoriser
Surface agent
Tout agent MCP/CLI
Type
Skill
Installation
Single
Confiance
Confiance : Established
Point d'entrée
imbalanced-learn
Commande d'installation directe
npx -y tokrepo@latest install 1cf4ecef-7761-11f1-9bc6-00163e2b0d79 --target codex

À exécuter après confirmation du plan en dry-run.

Introduction

imbalanced-learn is a Python package that extends scikit-learn with resampling techniques designed for imbalanced classification problems. When one class vastly outnumbers the other (fraud detection, medical diagnosis, anomaly detection), standard classifiers tend to ignore the minority class. imbalanced-learn provides tools to rebalance the dataset before training.

What imbalanced-learn Does

  • Over-sampling methods: SMOTE, ADASYN, BorderlineSMOTE, and random oversampling
  • Under-sampling methods: Tomek links, edited nearest neighbours, random undersampling
  • Combination methods that chain over- and under-sampling
  • Ensemble classifiers: BalancedRandomForest, EasyEnsemble, BalancedBagging
  • Pipeline integration with scikit-learn for clean preprocessing workflows

Architecture Overview

The library mirrors scikit-learn's API with fit_resample() for samplers and standard fit/predict for ensemble methods. Samplers implement a base class with consistent interfaces for oversampling (generating synthetic minority examples), undersampling (removing majority examples), or combinations. All samplers work with NumPy arrays and pandas DataFrames and integrate into imblearn.pipeline.Pipeline for end-to-end workflows.

Self-Hosting & Configuration

  • Install via pip: pip install imbalanced-learn
  • Requires Python 3.8+, scikit-learn, NumPy, SciPy, and joblib
  • No external services or GPU needed
  • Configure sampling strategies via ratio parameters (auto, float, or dict)
  • Drop-in replacement for scikit-learn pipelines

Key Features

  • Full scikit-learn API compatibility with fit_resample pattern
  • Multiple SMOTE variants for different data distributions
  • Ensemble methods designed specifically for imbalanced learning
  • Works with NumPy arrays and pandas DataFrames
  • Extensive documentation with practical examples and benchmarks

Comparison with Similar Tools

  • scikit-learn — provides class_weight parameter but no resampling; imbalanced-learn adds dedicated samplers
  • SMOTE (standalone) — imbalanced-learn bundles SMOTE plus many variants and undersampling methods
  • PyOD — focuses on outlier detection; imbalanced-learn targets supervised classification
  • XGBoost scale_pos_weight — model-level fix; imbalanced-learn operates at the data level

FAQ

Q: When should I use oversampling vs undersampling? A: Oversampling (SMOTE) works well with small datasets where losing samples is costly. Undersampling is faster and suitable when you have abundant majority samples.

Q: Can I use imbalanced-learn in a scikit-learn pipeline? A: Yes. Use imblearn.pipeline.Pipeline instead of sklearn.pipeline.Pipeline. It handles the fit_resample step automatically.

Q: Does SMOTE work with categorical features? A: Use SMOTENC (for mixed numeric/categorical) or SMOTEN (for purely categorical). Standard SMOTE handles only numeric features.

Q: Does it work with multi-class problems? A: Yes. Most samplers support multi-class by resampling each class independently based on the strategy parameter.

Sources

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires