ADAM

class ADAM(maxiter=10000, tol=1e-06, lr=0.001, beta_1=0.9, beta_2=0.99, noise_factor=1e-08, eps=1e-10, amsgrad=False, snapshot_dir=None)[source]

Adam and AMSGRAD optimizer.

Adam
Kingma, Diederik & Ba, Jimmy. (2014).
Adam: A Method for Stochastic Optimization. International Conference on Learning Representations.

Adam is a gradient-based optimization algorithm that is relies on adaptive estimates of lower-order moments. The algorithm requires little memory and is invariant to diagonal rescaling of the gradients. Furthermore, it is able to cope with non-stationary objective functions and noisy and/or sparse gradients.


AMSGRAD
Sashank J. Reddi and Satyen Kale and Sanjiv Kumar. (2018).
On the Convergence of Adam and Beyond. International Conference on Learning Representations.

AMSGRAD (a variant of ADAM) uses a ‘long-term memory’ of past gradients and, thereby, improves convergence properties.

Parameters
  • maxiter (int) – Maximum number of iterations

  • tol (float) – Tolerance for termination

  • lr (float) – Value >= 0, Learning rate.

  • beta_1 (float) – Value in range 0 to 1, Generally close to 1.

  • beta_2 (float) – Value in range 0 to 1, Generally close to 1.

  • noise_factor (float) – Value >= 0, Noise factor

  • eps (float) – Value >=0, Epsilon to be used for finite differences if no analytic gradient method is given.

  • amsgrad (bool) – True to use AMSGRAD, False if not

  • snapshot_dir (Optional[str]) – If not None save the optimizer’s parameter after every step to the given directory

Attributes

ADAM.bounds_support_level

Returns bounds support level

ADAM.gradient_support_level

Returns gradient support level

ADAM.initial_point_support_level

Returns initial point support level

ADAM.is_bounds_ignored

Returns is bounds ignored

ADAM.is_bounds_required

Returns is bounds required

ADAM.is_bounds_supported

Returns is bounds supported

ADAM.is_gradient_ignored

Returns is gradient ignored

ADAM.is_gradient_required

Returns is gradient required

ADAM.is_gradient_supported

Returns is gradient supported

ADAM.is_initial_point_ignored

Returns is initial point ignored

ADAM.is_initial_point_required

Returns is initial point required

ADAM.is_initial_point_supported

Returns is initial point supported

ADAM.setting

Return setting

Methods

ADAM.get_support_level()

Return support level dictionary

ADAM.gradient_num_diff(x_center, f, epsilon)

We compute the gradient with the numeric differentiation in the parallel way, around the point x_center.

ADAM.load_params(load_dir)

load params

ADAM.minimize(objective_function, …)

ADAM.optimize(num_vars, objective_function)

Perform optimization.

ADAM.print_options()

Print algorithm-specific options.

ADAM.save_params(snapshot_dir)

save params

ADAM.set_max_evals_grouped(limit)

Set max evals grouped

ADAM.set_options(**kwargs)

Sets or updates values in the options dictionary.

ADAM.wrap_function(function, args)

Wrap the function to implicitly inject the args at the call of the function.