# GPT-NeoX — Open-Source Large Language Model Training Library

> A GPU-optimized library by EleutherAI for training large-scale autoregressive language models. GPT-NeoX powered the training of GPT-NeoX-20B and Pythia, providing the open-source community with tools for billion-parameter model training.

## Install

Save the content below to `.claude/skills/` or append to your `CLAUDE.md`:

# GPT-NeoX — Open-Source Large Language Model Training Library

## Quick Use
```bash
git clone https://github.com/EleutherAI/gpt-neox.git && cd gpt-neox
pip install -r requirements/requirements.txt
python deepy.py train.py configs/small.yml configs/local_setup.yml
```

## Introduction
GPT-NeoX is EleutherAI's distributed training framework built on top of Megatron-LM and DeepSpeed. It was designed to make training billion-parameter language models accessible to the open-source research community, and it produced the GPT-NeoX-20B and Pythia model suites.

## What GPT-NeoX Does
- Trains autoregressive transformer language models at scales from millions to tens of billions of parameters
- Combines Megatron-style tensor parallelism with DeepSpeed ZeRO for efficient distributed training
- Supports rotary positional embeddings, parallel attention-FFN, and other modern LLM architecture choices
- Provides YAML-based configuration for full control over model architecture and training hyperparameters
- Includes evaluation harness integration for benchmarking trained models

## Architecture Overview
GPT-NeoX fuses NVIDIA Megatron-LM's tensor and pipeline parallelism with Microsoft DeepSpeed's ZeRO optimizer stages. The training engine distributes model parameters, gradients, and optimizer states across GPUs, enabling models that exceed single-GPU memory. Model architecture and training settings are specified through composable YAML configs that override defaults hierarchically.

## Self-Hosting & Configuration
- Requires Python 3.8+, PyTorch 1.8+, and NVIDIA GPUs with NCCL
- Multi-node training uses SSH or a cluster scheduler like SLURM
- All architecture and training options are set via YAML config files
- Pre-built Docker containers available for reproducible environments
- Data preprocessing scripts convert raw text to tokenized binary shards

## Key Features
- Scales from a single GPU to hundreds of GPUs with model and data parallelism
- YAML-driven configuration makes experiments reproducible and easy to iterate
- Produced the Pythia model suite used in hundreds of research papers
- Supports FlashAttention, fused kernels, and mixed-precision training
- Evaluation pipeline integrates with EleutherAI's lm-evaluation-harness

## Comparison with Similar Tools
- **Megatron-LM** — NVIDIA's training framework; GPT-NeoX adds DeepSpeed integration and simpler configuration
- **DeepSpeed** — optimization library; GPT-NeoX provides the full model definition and training loop on top of DeepSpeed
- **LitGPT** — Lightning-based GPT training; simpler setup but less flexibility at very large scale
- **llm.c** — minimal C/CUDA implementation; GPT-NeoX targets production-scale distributed training

## FAQ
**Q: Can I train a model from scratch with GPT-NeoX?**
A: Yes. It supports full pre-training from raw text data, including tokenization, data sharding, and distributed training.

**Q: What models were trained with GPT-NeoX?**
A: GPT-NeoX-20B, the Pythia suite (70M to 12B), and Dolly 2.0 among others.

**Q: How many GPUs do I need?**
A: A small model can train on a single GPU. Reproducing GPT-NeoX-20B used 96 A100 GPUs.

**Q: Is GPT-NeoX still actively developed?**
A: The core codebase is stable. EleutherAI continues to use and maintain it for new research projects.

## Sources
- https://github.com/EleutherAI/gpt-neox
- https://arxiv.org/abs/2204.06745

---
Source: https://tokrepo.com/en/workflows/gpt-neox-open-source-large-language-model-training-library-f07fee9a
Author: AI Open Source