# NVIDIA NeMo — Toolkit for Building and Training AI Models

> NVIDIA NeMo is a scalable framework for building, training, and fine-tuning large language models, speech recognition, and text-to-speech models. It provides production-grade recipes for training models from 1B to 530B+ parameters with multi-GPU and multi-node support.

## Install

Save in your project root:

# NVIDIA NeMo — Toolkit for Building and Training AI Models

## Quick Use
```bash
pip install nemo_toolkit[all]
python -c "
import nemo
import nemo.collections.asr as nemo_asr
# Load a pre-trained speech recognition model
model = nemo_asr.models.EncDecCTCModel.from_pretrained('nvidia/stt_en_conformer_ctc_small')
print(f'NeMo {nemo.__version__} loaded with {model.__class__.__name__}')
"
```

## Introduction
NVIDIA NeMo is a framework for researchers and developers who need to build, train, and deploy conversational AI and generative AI models at scale. It provides pre-built collections for LLMs, automatic speech recognition (ASR), text-to-speech (TTS), and NLP tasks, all optimized for NVIDIA GPU clusters with integrated distributed training.

## What NeMo Does
- Trains and fine-tunes large language models using tensor, pipeline, and expert parallelism
- Provides end-to-end ASR pipelines with pre-trained Conformer and FastConformer models
- Supports TTS model training including FastPitch, HiFi-GAN, and RADTTS
- Implements RLHF, DPO, and SFT alignment methods for instruction-tuning LLMs
- Exports models to NVIDIA TensorRT-LLM and Triton for optimized production serving

## Architecture Overview
NeMo is built on PyTorch and uses NVIDIA Megatron-LM for distributed LLM training with 3D parallelism (tensor, pipeline, data). Models are defined as collections of Neural Modules that connect via typed ports. A YAML-based configuration system (via Hydra/OmegaConf) controls every training parameter. NeMo Curator handles data preprocessing at scale, while NeMo Guardrails adds safety controls for deployed models.

## Self-Hosting & Configuration
- Install via pip: `pip install nemo_toolkit[all]` requires PyTorch and CUDA
- Use NVIDIA NGC containers for pre-configured environments: `nvcr.io/nvidia/nemo`
- Training configs are YAML files specifying model architecture, data, optimizer, and parallelism
- Multi-GPU training uses `torchrun` or NeMo's built-in launcher with Slurm integration
- Fine-tune with LoRA or P-tuning via config overrides: `model.peft.peft_scheme=lora`

## Key Features
- Scales from single GPU to thousands of GPUs with automatic parallelism strategies
- Pre-trained model zoo on NVIDIA NGC with models for ASR, TTS, NLP, and LLMs
- NeMo Curator for large-scale data deduplication, filtering, and quality scoring
- NeMo Guardrails for adding programmable safety rails to deployed LLM applications
- Seamless export to TensorRT-LLM for up to 8x inference speedup on NVIDIA hardware

## Comparison with Similar Tools
- **Hugging Face Transformers** — broader model coverage but NeMo provides better multi-node training at scale
- **DeepSpeed** — focuses on distributed training optimization; NeMo provides full training recipes and model collections
- **Axolotl** — simpler fine-tuning setup but NeMo handles pre-training and larger-scale training
- **Megatron-LM** — NeMo builds on Megatron and adds ASR, TTS, data curation, and configuration management
- **vLLM** — inference-only; NeMo covers the full lifecycle from data prep through training to deployment

## FAQ
**Q: Do I need NVIDIA GPUs to use NeMo?**
A: Yes, NeMo is optimized for NVIDIA GPUs. Training requires CUDA-capable GPUs, and many features leverage NVIDIA-specific libraries like cuDNN and NCCL.

**Q: Can NeMo fine-tune open-weight models like LLaMA?**
A: Yes, NeMo supports SFT, LoRA, and RLHF/DPO fine-tuning for LLaMA, Mistral, Gemma, and other architectures with pre-built recipes.

**Q: How does NeMo handle data preprocessing?**
A: NeMo Curator provides GPU-accelerated data pipelines for deduplication, quality filtering, PII removal, and domain classification at petabyte scale.

**Q: Is NeMo suitable for speech applications?**
A: Yes, NeMo has extensive ASR and TTS collections with pre-trained models supporting 100+ languages and streaming inference.

## Sources
- https://github.com/NVIDIA/NeMo
- https://docs.nvidia.com/nemo-framework

---
Source: https://tokrepo.com/en/workflows/8a8ef0d8-3d9d-11f1-9bc6-00163e2b0d79
Author: AI Open Source