How do I install GPT-NeoX — Open-Source Large Language Model Training Library?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

GPT-NeoX — Open-Source Large Language Model Training Library

Introduction

GPT-NeoX is EleutherAI's distributed training framework built on top of Megatron-LM and DeepSpeed. It was designed to make training billion-parameter language models accessible to the open-source research community, and it produced the GPT-NeoX-20B and Pythia model suites.

What GPT-NeoX Does

Trains autoregressive transformer language models at scales from millions to tens of billions of parameters
Combines Megatron-style tensor parallelism with DeepSpeed ZeRO for efficient distributed training
Supports rotary positional embeddings, parallel attention-FFN, and other modern LLM architecture choices
Provides YAML-based configuration for full control over model architecture and training hyperparameters
Includes evaluation harness integration for benchmarking trained models

Architecture Overview

GPT-NeoX fuses NVIDIA Megatron-LM's tensor and pipeline parallelism with Microsoft DeepSpeed's ZeRO optimizer stages. The training engine distributes model parameters, gradients, and optimizer states across GPUs, enabling models that exceed single-GPU memory. Model architecture and training settings are specified through composable YAML configs that override defaults hierarchically.

Self-Hosting & Configuration

Requires Python 3.8+, PyTorch 1.8+, and NVIDIA GPUs with NCCL
Multi-node training uses SSH or a cluster scheduler like SLURM
All architecture and training options are set via YAML config files
Pre-built Docker containers available for reproducible environments
Data preprocessing scripts convert raw text to tokenized binary shards

Key Features

Scales from a single GPU to hundreds of GPUs with model and data parallelism
YAML-driven configuration makes experiments reproducible and easy to iterate
Produced the Pythia model suite used in hundreds of research papers
Supports FlashAttention, fused kernels, and mixed-precision training
Evaluation pipeline integrates with EleutherAI's lm-evaluation-harness

Comparison with Similar Tools

Megatron-LM — NVIDIA's training framework; GPT-NeoX adds DeepSpeed integration and simpler configuration
DeepSpeed — optimization library; GPT-NeoX provides the full model definition and training loop on top of DeepSpeed
LitGPT — Lightning-based GPT training; simpler setup but less flexibility at very large scale
llm.c — minimal C/CUDA implementation; GPT-NeoX targets production-scale distributed training

FAQ

Q: Can I train a model from scratch with GPT-NeoX? A: Yes. It supports full pre-training from raw text data, including tokenization, data sharding, and distributed training.

Q: What models were trained with GPT-NeoX? A: GPT-NeoX-20B, the Pythia suite (70M to 12B), and Dolly 2.0 among others.

Q: How many GPUs do I need? A: A small model can train on a single GPU. Reproducing GPT-NeoX-20B used 96 A100 GPUs.

Q: Is GPT-NeoX still actively developed? A: The core codebase is stable. EleutherAI continues to use and maintain it for new research projects.

GPT-NeoX — Open-Source Large Language Model Training Library

Introduction

What GPT-NeoX Does

Architecture Overview

Self-Hosting & Configuration

Key Features

Comparison with Similar Tools

FAQ

Sources

Discussion

Related Assets

Text Embeddings Inference — High-Performance Embedding Server by Hugging Face

SAM 2 — Segment Anything in Images and Videos

LLaVA — Large Language and Vision Assistant