How do I install imbalanced-learn — Handle Imbalanced Datasets in Python?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

imbalanced-learn — Handle Imbalanced Datasets in Python

Introduction

imbalanced-learn is a Python package that extends scikit-learn with resampling techniques designed for imbalanced classification problems. When one class vastly outnumbers the other (fraud detection, medical diagnosis, anomaly detection), standard classifiers tend to ignore the minority class. imbalanced-learn provides tools to rebalance the dataset before training.

What imbalanced-learn Does

Over-sampling methods: SMOTE, ADASYN, BorderlineSMOTE, and random oversampling
Under-sampling methods: Tomek links, edited nearest neighbours, random undersampling
Combination methods that chain over- and under-sampling
Ensemble classifiers: BalancedRandomForest, EasyEnsemble, BalancedBagging
Pipeline integration with scikit-learn for clean preprocessing workflows

Architecture Overview

The library mirrors scikit-learn's API with fit_resample() for samplers and standard fit/predict for ensemble methods. Samplers implement a base class with consistent interfaces for oversampling (generating synthetic minority examples), undersampling (removing majority examples), or combinations. All samplers work with NumPy arrays and pandas DataFrames and integrate into imblearn.pipeline.Pipeline for end-to-end workflows.

Self-Hosting & Configuration

Install via pip: pip install imbalanced-learn
Requires Python 3.8+, scikit-learn, NumPy, SciPy, and joblib
No external services or GPU needed
Configure sampling strategies via ratio parameters (auto, float, or dict)
Drop-in replacement for scikit-learn pipelines

Key Features

Full scikit-learn API compatibility with fit_resample pattern
Multiple SMOTE variants for different data distributions
Ensemble methods designed specifically for imbalanced learning
Works with NumPy arrays and pandas DataFrames
Extensive documentation with practical examples and benchmarks

Comparison with Similar Tools

scikit-learn — provides class_weight parameter but no resampling; imbalanced-learn adds dedicated samplers
SMOTE (standalone) — imbalanced-learn bundles SMOTE plus many variants and undersampling methods
PyOD — focuses on outlier detection; imbalanced-learn targets supervised classification
XGBoost scale_pos_weight — model-level fix; imbalanced-learn operates at the data level

FAQ

Q: When should I use oversampling vs undersampling? A: Oversampling (SMOTE) works well with small datasets where losing samples is costly. Undersampling is faster and suitable when you have abundant majority samples.

Q: Can I use imbalanced-learn in a scikit-learn pipeline? A: Yes. Use imblearn.pipeline.Pipeline instead of sklearn.pipeline.Pipeline. It handles the fit_resample step automatically.

Q: Does SMOTE work with categorical features? A: Use SMOTENC (for mixed numeric/categorical) or SMOTEN (for purely categorical). Standard SMOTE handles only numeric features.

Q: Does it work with multi-class problems? A: Yes. Most samplers support multi-class by resampling each class independently based on the strategy parameter.

imbalanced-learn — Handle Imbalanced Datasets in Python

Instalación lista para agent

Introduction

What imbalanced-learn Does

Architecture Overview

Self-Hosting & Configuration

Key Features

Comparison with Similar Tools

FAQ

Sources

Discusión

Activos relacionados

Biopython — Python Tools for Computational Biology

RustPython — Python Interpreter Written in Rust

AWS Chalice — Serverless Python Microframework for AWS Lambda

Pyrefly — Fast Python Type Checker and Language Server by Meta