# AutoGluon — AutoML for Tabular, Time-Series, Text, and Image Data > AutoGluon is AWS's AutoML toolkit. With one .fit() call it trains state-of-the-art ensembles on tabular, time-series, text, and image data — often beating hand-tuned models written by ML engineers. ## Install Save as a script file and run: # AutoGluon — AutoML in One .fit() Call ## Quick Use ```bash pip install autogluon ``` ```python from autogluon.tabular import TabularPredictor import pandas as pd train = pd.read_csv("train.csv") test = pd.read_csv("test.csv") predictor = TabularPredictor(label="target").fit(train, time_limit=600) leaderboard = predictor.leaderboard(test) predictions = predictor.predict(test) ``` ## Introduction AutoGluon, developed by AWS researchers, was created to democratize ML. With a single `fit()` call, it trains an ensemble of LightGBM, XGBoost, CatBoost, neural networks, and KNN on your tabular data — automatically handling missing values, categorical encoding, hyperparameter tuning, and stacking. With over 10,000 GitHub stars, AutoGluon consistently wins Kaggle competitions and beats hand-tuned baselines. It also supports time-series forecasting (TimeSeries), text classification (MultiModal), and image classification (MultiModal) — all with the same simple API. ## What AutoGluon Does AutoGluon trains many models in parallel, picks the best, and stacks them into a high-performing ensemble — all within a wall-clock time budget you specify. For each modality (Tabular, TimeSeries, MultiModal), it embeds best practices (preprocessing, model selection, ensembling) into a one-liner. ## Architecture Overview ``` fit(train_df, label="...") | [Auto preprocessing] type detection, missing values, categorical encoding, normalization | [Model zoo training] (parallel) LightGBM, XGBoost, CatBoost, RandomForest, ExtraTrees, KNN, NN_TORCH (MLP), TabPFN, etc. | [Hyperparameter search] (optional, time-budgeted) | [Multi-layer stacking + ensembling] | [Best-in-leaderboard predictor] | predict(test_df) ``` ## Self-Hosting & Configuration ```python # TimeSeries forecasting from autogluon.timeseries import TimeSeriesPredictor, TimeSeriesDataFrame import pandas as pd df = pd.read_csv("sales.csv") # columns: item_id, timestamp, target ts_df = TimeSeriesDataFrame.from_data_frame(df, id_column="item_id", timestamp_column="timestamp") predictor = TimeSeriesPredictor(prediction_length=30, target="target").fit( ts_df, presets="best_quality", time_limit=1800 ) forecasts = predictor.predict(ts_df) # MultiModal: text + image + tabular from autogluon.multimodal import MultiModalPredictor predictor = MultiModalPredictor(label="target").fit(train_df, time_limit=600) ``` ```python # Configure quality vs speed presets predictor = TabularPredictor(label="y").fit( train, presets="best_quality", # or "high_quality", "good_quality", "medium_quality" time_limit=3600, num_gpus=1, ) # Inspect models predictor.leaderboard(silent=True) predictor.feature_importance(test) ``` ## Key Features - **One .fit() call** — training, tuning, ensembling all automatic - **Tabular** — beats hand-tuned XGBoost in head-to-head benchmarks - **TimeSeries** — auto-trains 10+ forecasters (Prophet, DeepAR, TFT, ...) and ensembles - **MultiModal** — text + image + tabular in one model via deep learning - **Time budgets** — set time_limit; AutoGluon picks the best models that fit - **Quality presets** — best_quality / high_quality / good_quality / medium_quality - **Deployable** — `.save()` / `.load()` for production prediction - **Feature importance** — built-in interpretability for tabular models ## Comparison with Similar Tools | Feature | AutoGluon | H2O AutoML | TPOT | FLAML | scikit-learn baseline | |---|---|---|---|---|---| | Tabular performance | Excellent | Excellent | Good | Excellent | Manual | | Time-series | Yes | Limited | No | Yes | Manual | | Text/Image | Yes (MultiModal) | Limited | No | No | Manual | | Wall-clock budgeting | Yes | Yes | Yes | Yes | N/A | | Stacking | Yes | Yes | Yes | Limited | Manual | | Best For | One-shot Kaggle-grade ML | H2O ecosystem | Pipeline search | Speed-focused AutoML | Custom modeling | ## FAQ **Q: AutoGluon vs sklearn pipelines?** A: AutoGluon hides the pipeline details and ensembles many models. For exploratory ML or a Kaggle-quality first baseline, AutoGluon wins. For carefully crafted production models, sklearn (or PyTorch) gives full control. **Q: Does it support GPUs?** A: Yes. Tabular gets modest speedups; MultiModal (deep learning on text/image) benefits dramatically from GPUs. **Q: Can I use the predictions in production?** A: Yes. `predictor.save("dir")` then `TabularPredictor.load("dir")` for serving. Models work in any Python environment with AutoGluon installed. **Q: How long does training take?** A: You decide via `time_limit`. 5–10 minutes gives a solid baseline; 1+ hour with `best_quality` produces top-tier models. AutoGluon picks the best within your budget. ## Sources - GitHub: https://github.com/autogluon/autogluon - Docs: https://auto.gluon.ai - Company: AWS - License: Apache-2.0 --- Source: https://tokrepo.com/en/workflows/e0c86ffc-37db-11f1-9bc6-00163e2b0d79 Author: Script Depot