ADAM¶
- class ADAM(maxiter=10000, tol=1e-06, lr=0.001, beta_1=0.9, beta_2=0.99, noise_factor=1e-08, eps=1e-10, amsgrad=False, snapshot_dir=None)[source]¶
Adam and AMSGRAD optimizer.
AdamKingma, Diederik & Ba, Jimmy. (2014).Adam: A Method for Stochastic Optimization. International Conference on Learning Representations.Adam is a gradient-based optimization algorithm that is relies on adaptive estimates of lower-order moments. The algorithm requires little memory and is invariant to diagonal rescaling of the gradients. Furthermore, it is able to cope with non-stationary objective functions and noisy and/or sparse gradients.
AMSGRADSashank J. Reddi and Satyen Kale and Sanjiv Kumar. (2018).On the Convergence of Adam and Beyond. International Conference on Learning Representations.AMSGRAD (a variant of ADAM) uses a ‘long-term memory’ of past gradients and, thereby, improves convergence properties.
- Parameters
maxiter (
int
) – Maximum number of iterationstol (
float
) – Tolerance for terminationlr (
float
) – Value >= 0, Learning rate.beta_1 (
float
) – Value in range 0 to 1, Generally close to 1.beta_2 (
float
) – Value in range 0 to 1, Generally close to 1.noise_factor (
float
) – Value >= 0, Noise factoreps (
float
) – Value >=0, Epsilon to be used for finite differences if no analytic gradient method is given.amsgrad (
bool
) – True to use AMSGRAD, False if notsnapshot_dir (
Optional
[str
]) – If not None save the optimizer’s parameter after every step to the given directory
Attributes
Returns bounds support level
Returns gradient support level
Returns initial point support level
Returns is bounds ignored
Returns is bounds required
Returns is bounds supported
Returns is gradient ignored
Returns is gradient required
Returns is gradient supported
Returns is initial point ignored
Returns is initial point required
Returns is initial point supported
Return setting
Methods
Return support level dictionary
ADAM.gradient_num_diff
(x_center, f, epsilon)We compute the gradient with the numeric differentiation in the parallel way, around the point x_center.
ADAM.load_params
(load_dir)load params
ADAM.minimize
(objective_function, …)ADAM.optimize
(num_vars, objective_function)Perform optimization.
Print algorithm-specific options.
ADAM.save_params
(snapshot_dir)save params
ADAM.set_max_evals_grouped
(limit)Set max evals grouped
ADAM.set_options
(**kwargs)Sets or updates values in the options dictionary.
ADAM.wrap_function
(function, args)Wrap the function to implicitly inject the args at the call of the function.