PromptsApr 8, 2026·2 min read

LitGPT — Fine-Tune and Deploy AI Models Simply

Lightning AI's framework for fine-tuning and serving 20+ LLM families. LitGPT supports LoRA, QLoRA, full fine-tuning with one-command training on consumer hardware.

TL;DR
Lightning AI framework for fine-tuning and serving 20+ LLM families with LoRA, QLoRA, and full fine-tuning on consumer GPUs.
§01

What it is

LitGPT is Lightning AI's framework for fine-tuning and serving large language models. It supports 20+ model families including Llama, Mistral, Gemma, and Phi, with training methods including LoRA, QLoRA, and full fine-tuning. The framework is designed for one-command operations: download a model, chat with it, or fine-tune it with a single CLI command.

LitGPT targets ML engineers and developers who want to fine-tune open-source LLMs without building custom training infrastructure. It handles quantization, memory optimization, and distributed training transparently.

§02

How it saves time or tokens

LitGPT reduces fine-tuning from a multi-day infrastructure project to a single CLI command. QLoRA support means you can fine-tune a 7B parameter model on a single consumer GPU (RTX 3090 or above). The framework handles gradient checkpointing, mixed precision, and data loading automatically. For serving, the same framework deploys the fine-tuned model with optimized inference.

§03

How to use

  1. Install LitGPT:
pip install litgpt
  1. Download and chat with a model:
litgpt download meta-llama/Llama-3.1-8B-Instruct
litgpt chat meta-llama/Llama-3.1-8B-Instruct
  1. Fine-tune with LoRA:
litgpt finetune_lora meta-llama/Llama-3.1-8B-Instruct \
  --data JSON --data.json_path training_data.json
§04

Example

Complete fine-tuning workflow:

# Prepare training data as JSON
cat training_data.json
# [{"instruction": "Summarize this article", "input": "...", "output": "..."}]

# Fine-tune with QLoRA (fits on 24GB GPU)
litgpt finetune_lora meta-llama/Llama-3.1-8B-Instruct \
  --data JSON \
  --data.json_path training_data.json \
  --quantize bnb.nf4 \
  --train.epochs 3

# Chat with the fine-tuned model
litgpt chat checkpoints/fine-tuned/

# Serve the model
litgpt serve checkpoints/fine-tuned/ --port 8000
§05

Related on TokRepo

§06

Common pitfalls

  • QLoRA requires sufficient VRAM for the quantized model plus LoRA adapters. A 7B model needs roughly 12GB VRAM minimum with QLoRA.
  • Training data format must match LitGPT's expected schema (instruction/input/output JSON). Misformatted data causes silent training issues.
  • Fine-tuned models may overfit on small datasets. Use validation splits and monitor training loss to detect overfitting early.
  • Always check the official documentation for the latest version-specific changes and migration guides before upgrading in production environments.
  • For team deployments, establish clear guidelines on configuration and usage patterns to ensure consistency across developers.
  • Model downloads can be large (several GB for 7B+ models). Ensure adequate disk space and a stable internet connection before starting the download process.

Frequently Asked Questions

What model families does LitGPT support?+

LitGPT supports 20+ families including Llama, Mistral, Mixtral, Gemma, Phi, Falcon, StableLM, and others. New models are added as they are released. Check the documentation for the current list.

Can I fine-tune on a single consumer GPU?+

Yes. QLoRA support lets you fine-tune 7B-13B parameter models on GPUs with 24GB VRAM (RTX 3090, RTX 4090). Larger models require multi-GPU setups or more VRAM.

What is the difference between LoRA and QLoRA?+

LoRA adds small trainable adapter layers to the model while keeping the base weights frozen. QLoRA adds quantization (4-bit) on top of LoRA, reducing VRAM requirements by roughly 4x. QLoRA is recommended for consumer hardware.

Can I serve fine-tuned models with LitGPT?+

Yes. LitGPT includes a serve command that starts an OpenAI-compatible API server for your fine-tuned model. This lets you integrate the model into applications using standard API calls.

Does LitGPT support distributed training?+

Yes. LitGPT leverages Lightning's distributed training capabilities. Multi-GPU and multi-node training work transparently for both full fine-tuning and LoRA methods.

Citations (3)
  • LitGPT GitHub— LitGPT supports fine-tuning 20+ LLM families
  • LoRA Paper— LoRA and QLoRA for parameter-efficient fine-tuning
  • Lightning AI— Lightning AI distributed training framework
🙏

Source & Thanks

Created by Lightning AI. Licensed under Apache 2.0.

Lightning-AI/litgpt — 10k+ stars

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.