# Auto-Sklearn — Automated Machine Learning with Scikit-Learn

> Auto-Sklearn is an AutoML toolkit that automatically selects scikit-learn algorithms and tunes hyperparameters using Bayesian optimization, meta-learning, and ensemble construction to build high-accuracy models.

## Install

Save as a script file and run:

# Auto-Sklearn — Automated Machine Learning with Scikit-Learn

## Quick Use
```bash
pip install auto-sklearn
python -c "
import autosklearn.classification
import sklearn.datasets
import sklearn.metrics
X, y = sklearn.datasets.load_digits(return_X_y=True)
cls = autosklearn.classification.AutoSklearnClassifier(time_left_for_this_task=120, per_run_time_limit=30)
cls.fit(X[:1200], y[:1200])
predictions = cls.predict(X[1200:])
print(f'Accuracy: {sklearn.metrics.accuracy_score(y[1200:], predictions):.4f}')
"
```

## Introduction
Auto-Sklearn wraps the scikit-learn ecosystem with automated algorithm selection and hyperparameter optimization. It uses meta-learning to warm-start the search and Bayesian optimization via SMAC to efficiently explore the space of classifiers, regressors, and preprocessing pipelines.

## What Auto-Sklearn Does
- Selects the best scikit-learn algorithm from a pool of 15 classifiers and 14 feature preprocessors
- Tunes hyperparameters using Sequential Model-based Algorithm Configuration (SMAC)
- Applies meta-learning from 140 benchmark datasets to warm-start the optimization
- Constructs weighted ensembles from the top-performing models for better generalization
- Outputs a standard scikit-learn estimator that works with predict, score, and pipeline APIs

## Architecture Overview
Auto-Sklearn defines a combined algorithm selection and hyperparameter optimization (CASH) problem. It builds a configuration space spanning all scikit-learn estimators and their hyperparameters, then uses SMAC to sample configurations. Meta-learning narrows the initial search by retrieving configurations that performed well on similar datasets. After the time budget expires, an ensemble selection procedure combines the best models using greedy forward selection with replacement.

## Self-Hosting & Configuration
- Install via pip on Linux; requires Python 3.7-3.10 and a POSIX environment
- Set time_left_for_this_task to control total optimization time in seconds
- Use per_run_time_limit to cap individual model training time
- Pass include or exclude dictionaries to restrict the search to specific model families
- Enable resampling_strategy='cv' for cross-validation instead of holdout evaluation

## Key Features
- Fully automatic pipeline construction from raw features to final predictions
- Meta-learning accelerates search by starting from historically strong configurations
- Ensemble construction combines top models for accuracy beyond any single model
- Supports classification and regression with dedicated AutoSklearn classes
- Compatible with scikit-learn scoring functions and cross-validation utilities

## Comparison with Similar Tools
- **AutoKeras** — targets deep learning architecture search; Auto-Sklearn optimizes classical ML pipelines
- **TPOT** — uses genetic programming for pipeline search; Auto-Sklearn uses Bayesian optimization
- **H2O AutoML** — enterprise platform with distributed training; Auto-Sklearn is lightweight and scikit-learn-native
- **FLAML** — fast cost-effective AutoML; Auto-Sklearn includes meta-learning and ensemble construction
- **PyCaret** — low-code ML library; Auto-Sklearn provides deeper automated optimization

## FAQ
**Q: Does Auto-Sklearn work on Windows?**
A: It officially supports Linux. On Windows, use WSL2 or Docker for a compatible environment.

**Q: How much time should I allocate for the search?**
A: For tabular datasets under 100K rows, 10-30 minutes often produces competitive results. Larger datasets benefit from longer budgets.

**Q: Can I inspect which models were selected?**
A: Yes. Call show_models() to see the ensemble composition, or leaderboard() to rank all evaluated configurations.

**Q: Does it handle missing values and categorical features?**
A: Auto-Sklearn includes imputation and encoding in its preprocessing pipeline, so it handles missing values and categoricals automatically.

## Sources
- https://github.com/automl/auto-sklearn
- https://automl.github.io/auto-sklearn/

---
Source: https://tokrepo.com/en/workflows/asset-e2215791
Author: Script Depot