Configs2026年5月2日·1 分钟阅读

GPT-NeoX — Open-Source Large Language Model Training Library

A GPU-optimized library by EleutherAI for training large-scale autoregressive language models. GPT-NeoX powered the training of GPT-NeoX-20B and Pythia, providing the open-source community with tools for billion-parameter model training.

Introduction

GPT-NeoX is EleutherAI's distributed training framework built on top of Megatron-LM and DeepSpeed. It was designed to make training billion-parameter language models accessible to the open-source research community, and it produced the GPT-NeoX-20B and Pythia model suites.

What GPT-NeoX Does

  • Trains autoregressive transformer language models at scales from millions to tens of billions of parameters
  • Combines Megatron-style tensor parallelism with DeepSpeed ZeRO for efficient distributed training
  • Supports rotary positional embeddings, parallel attention-FFN, and other modern LLM architecture choices
  • Provides YAML-based configuration for full control over model architecture and training hyperparameters
  • Includes evaluation harness integration for benchmarking trained models

Architecture Overview

GPT-NeoX fuses NVIDIA Megatron-LM's tensor and pipeline parallelism with Microsoft DeepSpeed's ZeRO optimizer stages. The training engine distributes model parameters, gradients, and optimizer states across GPUs, enabling models that exceed single-GPU memory. Model architecture and training settings are specified through composable YAML configs that override defaults hierarchically.

Self-Hosting & Configuration

  • Requires Python 3.8+, PyTorch 1.8+, and NVIDIA GPUs with NCCL
  • Multi-node training uses SSH or a cluster scheduler like SLURM
  • All architecture and training options are set via YAML config files
  • Pre-built Docker containers available for reproducible environments
  • Data preprocessing scripts convert raw text to tokenized binary shards

Key Features

  • Scales from a single GPU to hundreds of GPUs with model and data parallelism
  • YAML-driven configuration makes experiments reproducible and easy to iterate
  • Produced the Pythia model suite used in hundreds of research papers
  • Supports FlashAttention, fused kernels, and mixed-precision training
  • Evaluation pipeline integrates with EleutherAI's lm-evaluation-harness

Comparison with Similar Tools

  • Megatron-LM — NVIDIA's training framework; GPT-NeoX adds DeepSpeed integration and simpler configuration
  • DeepSpeed — optimization library; GPT-NeoX provides the full model definition and training loop on top of DeepSpeed
  • LitGPT — Lightning-based GPT training; simpler setup but less flexibility at very large scale
  • llm.c — minimal C/CUDA implementation; GPT-NeoX targets production-scale distributed training

FAQ

Q: Can I train a model from scratch with GPT-NeoX? A: Yes. It supports full pre-training from raw text data, including tokenization, data sharding, and distributed training.

Q: What models were trained with GPT-NeoX? A: GPT-NeoX-20B, the Pythia suite (70M to 12B), and Dolly 2.0 among others.

Q: How many GPUs do I need? A: A small model can train on a single GPU. Reproducing GPT-NeoX-20B used 96 A100 GPUs.

Q: Is GPT-NeoX still actively developed? A: The core codebase is stable. EleutherAI continues to use and maintain it for new research projects.

Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产