LAMB
Layer-wise Adaptive Moments optimizer for Batch training is an optimizer that combines the benefits of both the Adam optimizer and the LARS optimizer.
Learn more about LAMB (opens in a new tab)
Usage
from tinygrad.nn.state import get_parameters
from tinygrad.nn import optim
from models.resnet import ResNet
model = ResNet()
for _ in range(5):
optimizer = optim.LAMB(get_parameters(model), lr=0.01)
# train, eval ...
Arguments
params
The parameters of the model to optimize.
lr (default: 0.001)
The learning rate of the optimizer.
b1 (default: 0.9)
The exponential decay rate for the first moment estimates.
b2 (default: 0.999)
The exponential decay rate for the second-moment estimates.
eps (default: 1e-8)
The epsilon value for numerical stability.
wd (default: 0.0)
The weight decay value to apply.
Adam (default: False)
The Adam optimizer is used instead of LAMB if set to True.