The optim module provides optimizers that have noy (yet) been added to PyTorch itself.


class pyblaze.optim.lamb.LAMB(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, lr_decay=0)[source]

Optimizer presented in “Large Batch Optimization for Deep Learning: Training Bert in 76 Minutes” (You et al., 2019).

The LAMB optimizer (“Layer-wise Adaptive Moments optimizer for Batch training”) enables training on very large batches and provides an alternative for Adam whose performance deteriorates for large batches.

__init__(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, lr_decay=0)[source]

Initializes a new LAMB optimizer.

  • params (iterable of torch.Tensor or dict of str -> torch.Tensor) – The parameters to optimize, optionally overriding default values for parameter groups.

  • lr (float, default: 1e-3) – The learning rate.

  • betas (tuple of (float, float), default: (0.9, 0.999)) – The betas used to compute the running average of gradients.

  • eps (float, default: 1e-8) – Epsilon parameter for numerical stability.

  • weight_decay (float, default: 0) – L2 penalty to apply.

  • lr_decay (float, default: 0) – Learning rate decay over each update.


Performs a single optimization step.


closure (callable, default: None) – A closure that reevaluates the model and returns the loss.