# TPOT — Automated Machine Learning with Genetic Programming > TPOT uses genetic programming to automatically design and optimize machine learning pipelines, selecting the best models and preprocessing steps from scikit-learn. ## Install Save in your project root: # TPOT — Automated Machine Learning with Genetic Programming ## Quick Use ```bash pip install tpot python -c " from tpot import TPOTClassifier from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split( *load_iris(return_X_y=True), test_size=0.2) clf = TPOTClassifier(generations=5, population_size=20, verbosity=2) clf.fit(X_train, y_train) print(clf.score(X_test, y_test)) clf.export('best_pipeline.py') " ``` ## Introduction TPOT (Tree-based Pipeline Optimization Tool) automates the most tedious parts of machine learning by intelligently exploring thousands of possible pipeline configurations. It uses genetic programming to evolve scikit-learn pipelines, freeing data scientists from manual feature engineering and model selection. ## What TPOT Does - Evolves complete ML pipelines using genetic programming - Automatically selects preprocessing, feature engineering, and model steps - Exports the best pipeline as a standalone Python script - Supports classification and regression tasks out of the box - Integrates with scikit-learn estimators and transformers ## Architecture Overview TPOT represents each pipeline as a tree structure where nodes are scikit-learn operators. A genetic algorithm mutates, crosses over, and selects pipelines across generations. Fitness is evaluated via cross-validation. The final champion pipeline is exported as clean Python code using scikit-learn primitives. ## Self-Hosting & Configuration - Install via pip with optional dependencies for XGBoost and DASK - Set generations and population_size to control search thoroughness - Use n_jobs=-1 to parallelize fitness evaluation across all cores - Enable DASK backend for distributed pipeline search on clusters - Configure scoring parameter to match your evaluation metric ## Key Features - Zero-config AutoML that finds competitive pipelines automatically - Exports reproducible Python code rather than opaque model objects - Supports custom operator sets and search constraints - Built-in stacking ensemble capabilities - Warm-start to resume optimization from a previous run ## Comparison with Similar Tools - **AutoGluon** — broader scope with tabular, text, and image; TPOT focuses on scikit-learn pipeline optimization - **auto-sklearn** — also optimizes sklearn pipelines but uses Bayesian optimization; TPOT uses genetic programming - **FLAML** — faster search via cost-frugal tuning; TPOT explores more pipeline structures - **H2O AutoML** — requires the H2O server; TPOT runs in pure Python ## FAQ **Q: How long does TPOT take to run?** A: Depends on dataset size and generations setting. Small datasets can finish in minutes; large ones may need hours. Use max_time_mins to set a budget. **Q: Can TPOT use GPUs?** A: TPOT itself is CPU-based, but you can include XGBoost with GPU support as a custom operator. **Q: Does TPOT support deep learning?** A: TPOT focuses on traditional ML pipelines. For neural architecture search, consider other tools. **Q: How do I interpret the exported pipeline?** A: TPOT exports a plain Python script with scikit-learn imports that you can read, modify, and run independently. ## Sources - https://github.com/EpistasisLab/tpot - http://epistasislab.github.io/tpot/ --- Source: https://tokrepo.com/en/workflows/asset-6c515b5a Author: AI Open Source