Optimizers¶
The optim module provides optimizers that have noy (yet) been added to PyTorch itself.
LAMB¶
-
class
pyblaze.optim.lamb.
LAMB
(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, lr_decay=0)[source]¶ Optimizer presented in “Large Batch Optimization for Deep Learning: Training Bert in 76 Minutes” (You et al., 2019).
The LAMB optimizer (“Layer-wise Adaptive Moments optimizer for Batch training”) enables training on very large batches and provides an alternative for Adam whose performance deteriorates for large batches.
-
__init__
(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, lr_decay=0)[source]¶ Initializes a new LAMB optimizer.
- Parameters
params (iterable of torch.Tensor or dict of str -> torch.Tensor) – The parameters to optimize, optionally overriding default values for parameter groups.
lr (float, default: 1e-3) – The learning rate.
betas (tuple of (float, float), default: (0.9, 0.999)) – The betas used to compute the running average of gradients.
eps (float, default: 1e-8) – Epsilon parameter for numerical stability.
weight_decay (float, default: 0) – L2 penalty to apply.
lr_decay (float, default: 0) – Learning rate decay over each update.
-