Training pipeline overview

The Modern LLM training pipeline implements a complete training workflow that takes a model from scratch to a fully aligned language model with verification capabilities.

Pipeline stages

The training pipeline consists of four sequential stages:

Pretraining

Train the language model from random initialization on large text corpora (WikiText-2, OpenWebText, TinyStories, Wikipedia). The model learns basic language understanding and generation capabilities through next-token prediction.

Supervised fine-tuning (SFT)

Fine-tune the pretrained model on instruction-following datasets (Alpaca, Dolly, OpenOrca). The model learns to follow instructions and respond to user queries in a conversational format.

Direct preference optimization (DPO)

Further align the SFT model using preference data (Anthropic HH-RLHF). The model learns to prefer chosen responses over rejected ones, improving alignment with human preferences without requiring a reward model.

Verifier training

Train a separate encoder model to score answer correctness on math/QA problems (GSM8K). This verifier can be used for verification during inference or as part of a best-of-N sampling strategy.

Quick start

Run the full pipeline

The simplest way to run the entire pipeline is with the unified runner script:

python scripts/run_pipeline.py --config local-smoke --stage all

Run individual stages

You can also run stages individually:

# Pretrain only
python scripts/run_pipeline.py --config local --stage pretrain

# Resume SFT from existing pretrain checkpoint
python scripts/run_pipeline.py --config local --stage sft \
    --checkpoint experiments/runs/local-full/pretrain_final.pt

# Run DPO on SFT checkpoint
python scripts/run_pipeline.py --config local --stage dpo \
    --checkpoint experiments/runs/local-full/sft_final.pt

# Train verifier (independent of other stages)
python scripts/run_pipeline.py --config local --stage verifier

Config presets

The pipeline comes with four built-in configuration presets optimized for different hardware and time constraints:

local-smoke

Quick smoke test (~5 minutes) for validation:

Model: d=256, L=4, H=4 (~10M params)
Steps: 100 pretrain / 50 SFT / 50 DPO / 50 verifier
Hardware: Works on CPU or any GPU
Use case: CI/CD testing, quick validation

local

Full training for consumer GPUs (RTX 3060) running ~24 hours:

Model: d=768, L=12, H=12 (~117M params)
Steps: 20K pretrain / 5K SFT / 2K DPO / 3K verifier
Hardware: RTX 3060 or better (12GB VRAM)
Use case: Research experiments, local development

gpu-smoke

Quick GPU test (~10 minutes):

Model: d=256, L=4, H=4 (~10M params)
Steps: 100 pretrain / 50 SFT / 50 DPO / 50 verifier
Hardware: Any modern GPU
Use case: Testing distributed training, GPU cluster validation

gpu

High-quality training for datacenter GPUs (A100/H100) running ~48 hours:

Model: d=1024, L=12, H=16 (~350M params)
Steps: 80K pretrain / 10K SFT / 3K DPO / 3K verifier
Datasets: Wikipedia + OpenWebText + WikiText-103 + TinyStories (100K)
Hardware: A100/H100 with 40-80GB VRAM
Use case: Production models, benchmark results

Configuration

Override hyperparameters

You can override specific hyperparameters via command-line arguments:

# Override training steps for all stages
python scripts/run_pipeline.py --config local --stage all --max-steps 1000

# Override specific stage steps
python scripts/run_pipeline.py --config local --stage all \
    --pretrain-steps 10000 \
    --sft-steps 2000 \
    --dpo-steps 500

# Override datasets
python scripts/run_pipeline.py --config local --stage all \
    --pretrain-datasets "wikitext-2-raw-v1,roneneldan/TinyStories"

# Custom output directory
python scripts/run_pipeline.py --config local --stage all \
    --output-dir /path/to/checkpoints

Custom config files

For more control, create a custom JSON config file:

configs/custom.json

{
  "d_model": 768,
  "n_layers": 12,
  "n_heads": 12,
  "max_seq_len": 1024,
  "pretrain_max_steps": 15000,
  "pretrain_lr": 3e-4,
  "sft_max_steps": 3000,
  "sft_lr": 1e-5,
  "dpo_max_steps": 1500,
  "dpo_lr": 5e-6,
  "dpo_beta": 0.1,
  "run_name": "my-custom-run"
}

Then run with your custom config:

python scripts/run_pipeline.py --config configs/custom.json --stage all

Output structure

The pipeline creates the following directory structure:

experiments/runs/<run_name>/
├── <run_name>-pretrain/
│   ├── <run_name>-pretrain_final.pt
│   ├── <run_name>-pretrain_step5000.pt
│   └── training.log
├── <run_name>-sft/
│   ├── <run_name>-sft_final.pt
│   └── training.log
├── <run_name>-dpo/
│   ├── <run_name>-dpo_final.pt
│   └── training.log
├── <run_name>-verifier/
│   ├── <run_name>-verifier_final.pt
│   └── training.log
└── pipeline_state.json

Each checkpoint (.pt file) contains:

model_state: Model weights
optimizer_state: Optimizer state for resumption
config: Model architecture config
step: Training step number

Pipeline state

When running the full pipeline with --stage all, the runner saves a pipeline_state.json file tracking all checkpoint paths:

{
  "pretrain_checkpoint": "experiments/runs/local-full/pretrain_final.pt",
  "sft_checkpoint": "experiments/runs/local-full/sft_final.pt",
  "dpo_checkpoint": "experiments/runs/local-full/dpo_final.pt",
  "verifier_checkpoint": "experiments/runs/local-full/verifier_final.pt",
  "completed_at": "2026-03-01T10:30:00"
}

Next steps

Pretraining

Learn about the pretraining stage and dataset options

SFT

Understand supervised fine-tuning on instruction data

DPO

Explore preference alignment with DPO

Verifier

Train a verifier for answer correctness

The full pipeline requires significant compute time:

local: ~24 hours on RTX 3060
gpu: ~48 hours on A100/H100

Consider starting with the smoke test preset to validate your setup before running the full pipeline.

Get Started

Architecture

Training Pipeline

Guides

Training pipeline overview

Pipeline stages

Quick start

Run the full pipeline

Run individual stages

Config presets

local-smoke

local

gpu-smoke

gpu

Configuration

Override hyperparameters

Custom config files

Output structure

Pipeline state

Next steps

Pretraining

SFT

DPO

Verifier

Build docs developers (and LLMs) love

Get Started

Architecture

Training Pipeline

Guides

​Pipeline stages

​Quick start

​Run the full pipeline

​Run individual stages

​Config presets

​local-smoke

​local

​gpu-smoke

​gpu

​Configuration

​Override hyperparameters

​Custom config files

​Output structure

​Pipeline state

​Next steps

Pretraining

SFT

DPO

Verifier

Build docs developers (and LLMs) love

Pipeline stages

Quick start

Run the full pipeline

Run individual stages

Config presets

local-smoke

local

gpu-smoke

gpu

Configuration

Override hyperparameters

Custom config files

Output structure

Pipeline state

Next steps