# NVIDIA NeMo — Toolkit for Building and Training AI Models > NVIDIA NeMo is a scalable framework for building, training, and fine-tuning large language models, speech recognition, and text-to-speech models. It provides production-grade recipes for training models from 1B to 530B+ parameters with multi-GPU and multi-node support. ## Install Save in your project root: # NVIDIA NeMo — Toolkit for Building and Training AI Models ## Quick Use ```bash pip install nemo_toolkit[all] python -c " import nemo import nemo.collections.asr as nemo_asr # Load a pre-trained speech recognition model model = nemo_asr.models.EncDecCTCModel.from_pretrained('nvidia/stt_en_conformer_ctc_small') print(f'NeMo {nemo.__version__} loaded with {model.__class__.__name__}') " ``` ## Introduction NVIDIA NeMo is a framework for researchers and developers who need to build, train, and deploy conversational AI and generative AI models at scale. It provides pre-built collections for LLMs, automatic speech recognition (ASR), text-to-speech (TTS), and NLP tasks, all optimized for NVIDIA GPU clusters with integrated distributed training. ## What NeMo Does - Trains and fine-tunes large language models using tensor, pipeline, and expert parallelism - Provides end-to-end ASR pipelines with pre-trained Conformer and FastConformer models - Supports TTS model training including FastPitch, HiFi-GAN, and RADTTS - Implements RLHF, DPO, and SFT alignment methods for instruction-tuning LLMs - Exports models to NVIDIA TensorRT-LLM and Triton for optimized production serving ## Architecture Overview NeMo is built on PyTorch and uses NVIDIA Megatron-LM for distributed LLM training with 3D parallelism (tensor, pipeline, data). Models are defined as collections of Neural Modules that connect via typed ports. A YAML-based configuration system (via Hydra/OmegaConf) controls every training parameter. NeMo Curator handles data preprocessing at scale, while NeMo Guardrails adds safety controls for deployed models. ## Self-Hosting & Configuration - Install via pip: `pip install nemo_toolkit[all]` requires PyTorch and CUDA - Use NVIDIA NGC containers for pre-configured environments: `nvcr.io/nvidia/nemo` - Training configs are YAML files specifying model architecture, data, optimizer, and parallelism - Multi-GPU training uses `torchrun` or NeMo's built-in launcher with Slurm integration - Fine-tune with LoRA or P-tuning via config overrides: `model.peft.peft_scheme=lora` ## Key Features - Scales from single GPU to thousands of GPUs with automatic parallelism strategies - Pre-trained model zoo on NVIDIA NGC with models for ASR, TTS, NLP, and LLMs - NeMo Curator for large-scale data deduplication, filtering, and quality scoring - NeMo Guardrails for adding programmable safety rails to deployed LLM applications - Seamless export to TensorRT-LLM for up to 8x inference speedup on NVIDIA hardware ## Comparison with Similar Tools - **Hugging Face Transformers** — broader model coverage but NeMo provides better multi-node training at scale - **DeepSpeed** — focuses on distributed training optimization; NeMo provides full training recipes and model collections - **Axolotl** — simpler fine-tuning setup but NeMo handles pre-training and larger-scale training - **Megatron-LM** — NeMo builds on Megatron and adds ASR, TTS, data curation, and configuration management - **vLLM** — inference-only; NeMo covers the full lifecycle from data prep through training to deployment ## FAQ **Q: Do I need NVIDIA GPUs to use NeMo?** A: Yes, NeMo is optimized for NVIDIA GPUs. Training requires CUDA-capable GPUs, and many features leverage NVIDIA-specific libraries like cuDNN and NCCL. **Q: Can NeMo fine-tune open-weight models like LLaMA?** A: Yes, NeMo supports SFT, LoRA, and RLHF/DPO fine-tuning for LLaMA, Mistral, Gemma, and other architectures with pre-built recipes. **Q: How does NeMo handle data preprocessing?** A: NeMo Curator provides GPU-accelerated data pipelines for deduplication, quality filtering, PII removal, and domain classification at petabyte scale. **Q: Is NeMo suitable for speech applications?** A: Yes, NeMo has extensive ASR and TTS collections with pre-trained models supporting 100+ languages and streaming inference. ## Sources - https://github.com/NVIDIA/NeMo - https://docs.nvidia.com/nemo-framework --- Source: https://tokrepo.com/en/workflows/8a8ef0d8-3d9d-11f1-9bc6-00163e2b0d79 Author: AI Open Source