# GPT-NeoX — Open-Source Large Language Model Training Library > A GPU-optimized library by EleutherAI for training large-scale autoregressive language models. GPT-NeoX powered the training of GPT-NeoX-20B and Pythia, providing the open-source community with tools for billion-parameter model training. ## Install Save the content below to `.claude/skills/` or append to your `CLAUDE.md`: # GPT-NeoX — Open-Source Large Language Model Training Library ## Quick Use ```bash git clone https://github.com/EleutherAI/gpt-neox.git && cd gpt-neox pip install -r requirements/requirements.txt python deepy.py train.py configs/small.yml configs/local_setup.yml ``` ## Introduction GPT-NeoX is EleutherAI's distributed training framework built on top of Megatron-LM and DeepSpeed. It was designed to make training billion-parameter language models accessible to the open-source research community, and it produced the GPT-NeoX-20B and Pythia model suites. ## What GPT-NeoX Does - Trains autoregressive transformer language models at scales from millions to tens of billions of parameters - Combines Megatron-style tensor parallelism with DeepSpeed ZeRO for efficient distributed training - Supports rotary positional embeddings, parallel attention-FFN, and other modern LLM architecture choices - Provides YAML-based configuration for full control over model architecture and training hyperparameters - Includes evaluation harness integration for benchmarking trained models ## Architecture Overview GPT-NeoX fuses NVIDIA Megatron-LM's tensor and pipeline parallelism with Microsoft DeepSpeed's ZeRO optimizer stages. The training engine distributes model parameters, gradients, and optimizer states across GPUs, enabling models that exceed single-GPU memory. Model architecture and training settings are specified through composable YAML configs that override defaults hierarchically. ## Self-Hosting & Configuration - Requires Python 3.8+, PyTorch 1.8+, and NVIDIA GPUs with NCCL - Multi-node training uses SSH or a cluster scheduler like SLURM - All architecture and training options are set via YAML config files - Pre-built Docker containers available for reproducible environments - Data preprocessing scripts convert raw text to tokenized binary shards ## Key Features - Scales from a single GPU to hundreds of GPUs with model and data parallelism - YAML-driven configuration makes experiments reproducible and easy to iterate - Produced the Pythia model suite used in hundreds of research papers - Supports FlashAttention, fused kernels, and mixed-precision training - Evaluation pipeline integrates with EleutherAI's lm-evaluation-harness ## Comparison with Similar Tools - **Megatron-LM** — NVIDIA's training framework; GPT-NeoX adds DeepSpeed integration and simpler configuration - **DeepSpeed** — optimization library; GPT-NeoX provides the full model definition and training loop on top of DeepSpeed - **LitGPT** — Lightning-based GPT training; simpler setup but less flexibility at very large scale - **llm.c** — minimal C/CUDA implementation; GPT-NeoX targets production-scale distributed training ## FAQ **Q: Can I train a model from scratch with GPT-NeoX?** A: Yes. It supports full pre-training from raw text data, including tokenization, data sharding, and distributed training. **Q: What models were trained with GPT-NeoX?** A: GPT-NeoX-20B, the Pythia suite (70M to 12B), and Dolly 2.0 among others. **Q: How many GPUs do I need?** A: A small model can train on a single GPU. Reproducing GPT-NeoX-20B used 96 A100 GPUs. **Q: Is GPT-NeoX still actively developed?** A: The core codebase is stable. EleutherAI continues to use and maintain it for new research projects. ## Sources - https://github.com/EleutherAI/gpt-neox - https://arxiv.org/abs/2204.06745 --- Source: https://tokrepo.com/en/workflows/gpt-neox-open-source-large-language-model-training-library-f07fee9a Author: AI Open Source