# AutoGluon — AutoML for Tabular, Time-Series, Text, and Image Data

> AutoGluon is AWS's AutoML toolkit. With one .fit() call it trains state-of-the-art ensembles on tabular, time-series, text, and image data — often beating hand-tuned models written by ML engineers.

## Install

Save as a script file and run:

# AutoGluon — AutoML in One .fit() Call

## Quick Use
```bash
pip install autogluon
```

```python
from autogluon.tabular import TabularPredictor
import pandas as pd

train = pd.read_csv("train.csv")
test  = pd.read_csv("test.csv")

predictor = TabularPredictor(label="target").fit(train, time_limit=600)
leaderboard = predictor.leaderboard(test)
predictions = predictor.predict(test)
```

## Introduction
AutoGluon, developed by AWS researchers, was created to democratize ML. With a single `fit()` call, it trains an ensemble of LightGBM, XGBoost, CatBoost, neural networks, and KNN on your tabular data — automatically handling missing values, categorical encoding, hyperparameter tuning, and stacking.

With over 10,000 GitHub stars, AutoGluon consistently wins Kaggle competitions and beats hand-tuned baselines. It also supports time-series forecasting (TimeSeries), text classification (MultiModal), and image classification (MultiModal) — all with the same simple API.

## What AutoGluon Does
AutoGluon trains many models in parallel, picks the best, and stacks them into a high-performing ensemble — all within a wall-clock time budget you specify. For each modality (Tabular, TimeSeries, MultiModal), it embeds best practices (preprocessing, model selection, ensembling) into a one-liner.

## Architecture Overview
```
fit(train_df, label="...")
      |
[Auto preprocessing]
  type detection, missing values,
  categorical encoding, normalization
      |
[Model zoo training] (parallel)
  LightGBM, XGBoost, CatBoost,
  RandomForest, ExtraTrees, KNN,
  NN_TORCH (MLP), TabPFN, etc.
      |
[Hyperparameter search] (optional, time-budgeted)
      |
[Multi-layer stacking + ensembling]
      |
[Best-in-leaderboard predictor]
      |
predict(test_df)
```

## Self-Hosting & Configuration
```python
# TimeSeries forecasting
from autogluon.timeseries import TimeSeriesPredictor, TimeSeriesDataFrame
import pandas as pd

df = pd.read_csv("sales.csv")  # columns: item_id, timestamp, target
ts_df = TimeSeriesDataFrame.from_data_frame(df, id_column="item_id", timestamp_column="timestamp")

predictor = TimeSeriesPredictor(prediction_length=30, target="target").fit(
    ts_df, presets="best_quality", time_limit=1800
)
forecasts = predictor.predict(ts_df)

# MultiModal: text + image + tabular
from autogluon.multimodal import MultiModalPredictor
predictor = MultiModalPredictor(label="target").fit(train_df, time_limit=600)
```

```python
# Configure quality vs speed presets
predictor = TabularPredictor(label="y").fit(
    train,
    presets="best_quality",     # or "high_quality", "good_quality", "medium_quality"
    time_limit=3600,
    num_gpus=1,
)
# Inspect models
predictor.leaderboard(silent=True)
predictor.feature_importance(test)
```

## Key Features
- **One .fit() call** — training, tuning, ensembling all automatic
- **Tabular** — beats hand-tuned XGBoost in head-to-head benchmarks
- **TimeSeries** — auto-trains 10+ forecasters (Prophet, DeepAR, TFT, ...) and ensembles
- **MultiModal** — text + image + tabular in one model via deep learning
- **Time budgets** — set time_limit; AutoGluon picks the best models that fit
- **Quality presets** — best_quality / high_quality / good_quality / medium_quality
- **Deployable** — `.save()` / `.load()` for production prediction
- **Feature importance** — built-in interpretability for tabular models

## Comparison with Similar Tools
| Feature | AutoGluon | H2O AutoML | TPOT | FLAML | scikit-learn baseline |
|---|---|---|---|---|---|
| Tabular performance | Excellent | Excellent | Good | Excellent | Manual |
| Time-series | Yes | Limited | No | Yes | Manual |
| Text/Image | Yes (MultiModal) | Limited | No | No | Manual |
| Wall-clock budgeting | Yes | Yes | Yes | Yes | N/A |
| Stacking | Yes | Yes | Yes | Limited | Manual |
| Best For | One-shot Kaggle-grade ML | H2O ecosystem | Pipeline search | Speed-focused AutoML | Custom modeling |

## FAQ
**Q: AutoGluon vs sklearn pipelines?**
A: AutoGluon hides the pipeline details and ensembles many models. For exploratory ML or a Kaggle-quality first baseline, AutoGluon wins. For carefully crafted production models, sklearn (or PyTorch) gives full control.

**Q: Does it support GPUs?**
A: Yes. Tabular gets modest speedups; MultiModal (deep learning on text/image) benefits dramatically from GPUs.

**Q: Can I use the predictions in production?**
A: Yes. `predictor.save("dir")` then `TabularPredictor.load("dir")` for serving. Models work in any Python environment with AutoGluon installed.

**Q: How long does training take?**
A: You decide via `time_limit`. 5–10 minutes gives a solid baseline; 1+ hour with `best_quality` produces top-tier models. AutoGluon picks the best within your budget.

## Sources
- GitHub: https://github.com/autogluon/autogluon
- Docs: https://auto.gluon.ai
- Company: AWS
- License: Apache-2.0

---
Source: https://tokrepo.com/en/workflows/e0c86ffc-37db-11f1-9bc6-00163e2b0d79
Author: Script Depot