Learning rate schedulers adjust the learning rate during training according to predefined schedules. This helps improve convergence and prevent overshooting optimal solutions.
Base Scheduler
All schedulers (except ReduceLROnPlateau) extend the LRScheduler base class, which provides:
step() - Update learning rates for the next epoch
getLr() - Compute learning rates for current epoch
getLastLr() - Get current learning rates for all parameter groups
epoch - Get current epoch number
StepLR
Decays the learning rate by gamma every stepSize epochs.
Formula: lr = baseLr * gamma^(epoch // stepSize)
import { SGD , StepLR } from 'deepbox/optim' ;
const optimizer = new SGD ( model . parameters (), { lr: 0.1 });
const scheduler = new StepLR ( optimizer , { stepSize: 30 , gamma: 0.1 });
for ( let epoch = 0 ; epoch < 100 ; epoch ++ ) {
train ();
scheduler . step ();
}
Constructor
The optimizer whose learning rate will be scheduled
Scheduler configuration Number of epochs between learning rate decays (must be positive integer)
Multiplicative factor for learning rate decay (must be positive)
Index of last epoch (used for resuming training)
Example Schedule
const scheduler = new StepLR ( optimizer , { stepSize: 30 , gamma: 0.1 });
// Epochs 0-29: lr = 0.1
// Epochs 30-59: lr = 0.01
// Epochs 60-89: lr = 0.001
MultiStepLR
Decays the learning rate by gamma when the epoch reaches one of the milestones.
import { MultiStepLR } from 'deepbox/optim' ;
const scheduler = new MultiStepLR ( optimizer , {
milestones: [ 30 , 80 ],
gamma: 0.1
});
Constructor
The optimizer to schedule
Scheduler configuration List of epoch indices at which to decay the learning rate (must be strictly increasing non-negative integers)
Multiplicative factor for learning rate decay
Example Schedule
const scheduler = new MultiStepLR ( optimizer , {
milestones: [ 30 , 80 ],
gamma: 0.1
});
// Epochs 0-29: lr = 0.1
// Epochs 30-79: lr = 0.01
// Epochs 80+: lr = 0.001
ExponentialLR
Decays the learning rate exponentially every epoch.
Formula: lr = baseLr * gamma^epoch
import { ExponentialLR } from 'deepbox/optim' ;
const scheduler = new ExponentialLR ( optimizer , { gamma: 0.95 });
// lr *= 0.95 each epoch
Constructor
The optimizer to schedule
Scheduler configuration Multiplicative factor for exponential decay (must be positive)
CosineAnnealingLR
Sets the learning rate using a cosine annealing schedule. The learning rate oscillates between the base learning rate and etaMin following a cosine curve.
Formula: lr = etaMin + (baseLr - etaMin) * (1 + cos(π * epoch / T_max)) / 2
import { CosineAnnealingLR } from 'deepbox/optim' ;
const scheduler = new CosineAnnealingLR ( optimizer , {
T_max: 100 ,
etaMin: 0.001
});
Constructor
The optimizer to schedule
Scheduler configuration Maximum number of epochs (one cosine cycle period). Can also use tMax property name.
Minimum learning rate (must be non-negative)
Training Example
const scheduler = new CosineAnnealingLR ( optimizer , {
T_max: 100 ,
etaMin: 0.001
});
for ( let epoch = 0 ; epoch < 100 ; epoch ++ ) {
train ();
scheduler . step ();
}
OneCycleLR
Implements the 1cycle learning rate policy. The learning rate starts at maxLr/divFactor, increases to maxLr over pctStart of the training, then decreases to maxLr/finalDivFactor.
import { OneCycleLR } from 'deepbox/optim' ;
const scheduler = new OneCycleLR ( optimizer , {
maxLr: 0.1 ,
totalSteps: 1000 ,
pctStart: 0.3
});
Constructor
The optimizer to schedule
Scheduler configuration Maximum learning rate (must be positive)
Total number of training steps (must be positive integer)
Percentage of cycle spent increasing learning rate (range: 0-1)
Initial learning rate divisor: initialLr = maxLr / divFactor
Final learning rate divisor: minLr = maxLr / finalDivFactor
annealStrategy
'cos' | 'linear'
default: "'cos'"
Annealing strategy for decreasing phase
Training Example
const totalSteps = numEpochs * stepsPerEpoch ;
const scheduler = new OneCycleLR ( optimizer , {
maxLr: 0.1 ,
totalSteps: totalSteps ,
pctStart: 0.3
});
for ( let epoch = 0 ; epoch < numEpochs ; epoch ++ ) {
for ( const batch of dataLoader ) {
train ( batch );
scheduler . step (); // Step per batch, not per epoch
}
}
ReduceLROnPlateau
Reduces learning rate when a metric has stopped improving. Unlike other schedulers, this one requires a metric value to be passed to step().
import { ReduceLROnPlateau } from 'deepbox/optim' ;
const scheduler = new ReduceLROnPlateau ( optimizer , {
mode: 'min' ,
factor: 0.1 ,
patience: 10
});
for ( let epoch = 0 ; epoch < 100 ; epoch ++ ) {
train ();
const valLoss = validate ();
scheduler . step ( valLoss );
}
Constructor
The optimizer to schedule
Scheduler configuration mode
'min' | 'max'
default: "'min'"
Whether to minimize or maximize the metric
Factor by which to reduce learning rate (range: 0-1)
Number of epochs with no improvement before reducing learning rate
Threshold for measuring improvement
Number of epochs to wait before resuming normal operation after lr reduction
Minimum learning rate (lower bound)
Methods
Update learning rates based on the metric value. Reduces learning rate if no improvement for patience epochs.
Get current learning rates for all parameter groups
Training Example
const scheduler = new ReduceLROnPlateau ( optimizer , {
mode: 'min' ,
factor: 0.1 ,
patience: 10 ,
threshold: 1e-4
});
for ( let epoch = 0 ; epoch < numEpochs ; epoch ++ ) {
trainLoss = train ();
valLoss = validate ();
// Reduce LR if validation loss plateaus
scheduler . step ( valLoss );
console . log ( `Epoch ${ epoch } : LR = ${ scheduler . getLastLr ()[ 0 ] } ` );
}
WarmupLR
Linearly increases the learning rate from 0 to the base learning rate over warmupEpochs, then delegates to a wrapped scheduler.
import { WarmupLR , CosineAnnealingLR } from 'deepbox/optim' ;
const baseScheduler = new CosineAnnealingLR ( optimizer , { T_max: 100 });
const scheduler = new WarmupLR ( optimizer , baseScheduler , {
warmupEpochs: 5
});
Constructor
The optimizer to schedule
afterScheduler
LRScheduler | null
required
Scheduler to use after warmup period completes. Pass null to maintain base learning rate after warmup.
Scheduler configuration Number of epochs for warmup (must be positive integer)
Training Example
// Warmup for 5 epochs, then cosine annealing
const baseScheduler = new CosineAnnealingLR ( optimizer , { T_max: 95 });
const scheduler = new WarmupLR ( optimizer , baseScheduler , {
warmupEpochs: 5
});
for ( let epoch = 0 ; epoch < 100 ; epoch ++ ) {
train ();
scheduler . step ();
}
Warmup without Base Scheduler
// Just warmup, no scheduler after
const scheduler = new WarmupLR ( optimizer , null , { warmupEpochs: 5 });
Usage Patterns
Basic Usage
import { SGD , StepLR } from 'deepbox/optim' ;
const optimizer = new SGD ( model . parameters (), { lr: 0.1 });
const scheduler = new StepLR ( optimizer , { stepSize: 10 , gamma: 0.1 });
for ( let epoch = 0 ; epoch < 100 ; epoch ++ ) {
// Training loop
for ( const batch of dataLoader ) {
optimizer . zeroGrad ();
const loss = computeLoss ( batch );
loss . backward ();
optimizer . step ();
}
// Step scheduler after each epoch
scheduler . step ();
}
Combining Schedulers with Warmup
import { AdamW , WarmupLR , CosineAnnealingLR } from 'deepbox/optim' ;
const optimizer = new AdamW ( model . parameters (), { lr: 0.001 });
const cosineScheduler = new CosineAnnealingLR ( optimizer , {
T_max: 95 ,
etaMin: 1e-6
});
const scheduler = new WarmupLR ( optimizer , cosineScheduler , {
warmupEpochs: 5
});
for ( let epoch = 0 ; epoch < 100 ; epoch ++ ) {
train ();
scheduler . step ();
}
Metric-Based Scheduling
import { Adam , ReduceLROnPlateau } from 'deepbox/optim' ;
const optimizer = new Adam ( model . parameters (), { lr: 0.001 });
const scheduler = new ReduceLROnPlateau ( optimizer , {
mode: 'min' ,
factor: 0.5 ,
patience: 5 ,
minLr: 1e-6
});
for ( let epoch = 0 ; epoch < 100 ; epoch ++ ) {
trainLoss = train ();
valLoss = validate ();
// Scheduler monitors validation loss
scheduler . step ( valLoss );
const currentLr = scheduler . getLastLr ()[ 0 ];
console . log ( `Epoch ${ epoch } : val_loss= ${ valLoss } , lr= ${ currentLr } ` );
}
Resuming Training
const optimizer = new SGD ( model . parameters (), { lr: 0.1 });
const scheduler = new StepLR ( optimizer , {
stepSize: 10 ,
gamma: 0.1 ,
lastEpoch: 49 // Resume from epoch 50
});
for ( let epoch = 50 ; epoch < 100 ; epoch ++ ) {
train ();
scheduler . step ();
}
Choosing a Scheduler
StepLR - Simple and effective, good for initial experiments
MultiStepLR - More control over when to decay, common in computer vision
ExponentialLR - Smooth exponential decay
CosineAnnealingLR - Popular for transformers and modern architectures
OneCycleLR - Fast convergence with super-convergence phenomenon
ReduceLROnPlateau - Adaptive based on validation metrics
WarmupLR - Essential for training large models (transformers, vision transformers)