Configs2026年7月4日·1 分钟阅读

imbalanced-learn — Handle Imbalanced Datasets in Python

imbalanced-learn is a scikit-learn-compatible Python library providing over- and under-sampling techniques, ensemble methods, and pipeline utilities for learning from imbalanced datasets.

Agent 就绪

Agent 可直接安装

这个资产可安装;Agent 先选择当前运行时、检查安装计划,再运行匹配命令。

Native · 98/100策略:允许
Agent 入口
任意 MCP/CLI Agent
类型
Skill
安装
Single
信任
信任等级:Established
入口
imbalanced-learn
直接安装命令
npx -y tokrepo@latest install 1cf4ecef-7761-11f1-9bc6-00163e2b0d79 --target codex

先 dry-run 确认安装计划,再运行此命令。

Introduction

imbalanced-learn is a Python package that extends scikit-learn with resampling techniques designed for imbalanced classification problems. When one class vastly outnumbers the other (fraud detection, medical diagnosis, anomaly detection), standard classifiers tend to ignore the minority class. imbalanced-learn provides tools to rebalance the dataset before training.

What imbalanced-learn Does

  • Over-sampling methods: SMOTE, ADASYN, BorderlineSMOTE, and random oversampling
  • Under-sampling methods: Tomek links, edited nearest neighbours, random undersampling
  • Combination methods that chain over- and under-sampling
  • Ensemble classifiers: BalancedRandomForest, EasyEnsemble, BalancedBagging
  • Pipeline integration with scikit-learn for clean preprocessing workflows

Architecture Overview

The library mirrors scikit-learn's API with fit_resample() for samplers and standard fit/predict for ensemble methods. Samplers implement a base class with consistent interfaces for oversampling (generating synthetic minority examples), undersampling (removing majority examples), or combinations. All samplers work with NumPy arrays and pandas DataFrames and integrate into imblearn.pipeline.Pipeline for end-to-end workflows.

Self-Hosting & Configuration

  • Install via pip: pip install imbalanced-learn
  • Requires Python 3.8+, scikit-learn, NumPy, SciPy, and joblib
  • No external services or GPU needed
  • Configure sampling strategies via ratio parameters (auto, float, or dict)
  • Drop-in replacement for scikit-learn pipelines

Key Features

  • Full scikit-learn API compatibility with fit_resample pattern
  • Multiple SMOTE variants for different data distributions
  • Ensemble methods designed specifically for imbalanced learning
  • Works with NumPy arrays and pandas DataFrames
  • Extensive documentation with practical examples and benchmarks

Comparison with Similar Tools

  • scikit-learn — provides class_weight parameter but no resampling; imbalanced-learn adds dedicated samplers
  • SMOTE (standalone) — imbalanced-learn bundles SMOTE plus many variants and undersampling methods
  • PyOD — focuses on outlier detection; imbalanced-learn targets supervised classification
  • XGBoost scale_pos_weight — model-level fix; imbalanced-learn operates at the data level

FAQ

Q: When should I use oversampling vs undersampling? A: Oversampling (SMOTE) works well with small datasets where losing samples is costly. Undersampling is faster and suitable when you have abundant majority samples.

Q: Can I use imbalanced-learn in a scikit-learn pipeline? A: Yes. Use imblearn.pipeline.Pipeline instead of sklearn.pipeline.Pipeline. It handles the fit_resample step automatically.

Q: Does SMOTE work with categorical features? A: Use SMOTENC (for mixed numeric/categorical) or SMOTEN (for purely categorical). Standard SMOTE handles only numeric features.

Q: Does it work with multi-class problems? A: Yes. Most samplers support multi-class by resampling each class independently based on the strategy parameter.

Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产