qiskit.algorithms.optimizers.SPSA¶
-
class
SPSA
(maxiter=100, blocking=False, allowed_increase=None, trust_region=False, learning_rate=None, perturbation=None, last_avg=1, resamplings=1, perturbation_dims=None, second_order=False, regularization=None, hessian_delay=0, lse_solver=None, initial_hessian=None, callback=None)[source]¶ Simultaneous Perturbation Stochastic Approximation (SPSA) optimizer.
SPSA [1] is an gradient descent method for optimizing systems with multiple unknown parameters. As an optimization method, it is appropriately suited to large-scale population models, adaptive modeling, and simulation optimization.
See also
Many examples are presented at the SPSA Web site.
The main feature of SPSA is the stochastic gradient approximation, which requires only two measurements of the objective function, regardless of the dimension of the optimization problem.
Additionally to standard, first-order SPSA, where only gradient information is used, this implementation also allows second-order SPSA (2-SPSA) [2]. In 2-SPSA we also estimate the Hessian of the loss with a stochastic approximation and multiply the gradient with the inverse Hessian to take local curvature into account and improve convergence. Notably this Hessian estimate requires only a constant number of function evaluations unlike an exact evaluation of the Hessian, which scales quadratically in the number of function evaluations.
Note
SPSA can be used in the presence of noise, and it is therefore indicated in situations involving measurement uncertainty on a quantum computation when finding a minimum. If you are executing a variational algorithm using a Quantum ASseMbly Language (QASM) simulator or a real device, SPSA would be the most recommended choice among the optimizers provided here.
The optimization process can includes a calibration phase if neither the
learning_rate
norperturbation
is provided, which requires additional functional evaluations. (Note that either both or none must be set.) For further details on the automatic calibration, please refer to the supplementary information section IV. of [3].Note
This component has some function that is normally random. If you want to reproduce behavior then you should set the random number generator seed in the algorithm_globals (
qiskit.utils.algorithm_globals.random_seed = seed
).Examples
This short example runs SPSA for the ground state calculation of the
Z ^ Z
observable where the ansatz is aPauliTwoDesign
circuit.import numpy as np from qiskit.algorithms.optimizers import SPSA from qiskit.circuit.library import PauliTwoDesign from qiskit.opflow import Z, StateFn ansatz = PauliTwoDesign(2, reps=1, seed=2) observable = Z ^ Z initial_point = np.random.random(ansatz.num_parameters) def loss(x): bound = ansatz.bind_parameters(x) return np.real((StateFn(observable, is_measurement=True) @ StateFn(bound)).eval()) spsa = SPSA(maxiter=300) result = spsa.optimize(ansatz.num_parameters, loss, initial_point=initial_point)
To use the Hessian information, i.e. 2-SPSA, you can add second_order=True to the initializer of the SPSA class, the rest of the code remains the same.
two_spsa = SPSA(maxiter=300, second_order=True) result = two_spsa.optimize(ansatz.num_parameters, loss, initial_point=initial_point)
References
[1]: J. C. Spall (1998). An Overview of the Simultaneous Perturbation Method for Efficient Optimization, Johns Hopkins APL Technical Digest, 19(4), 482–492. Online at jhuapl.edu.
[2]: J. C. Spall (1997). Accelerated second-order stochastic optimization using only function measurements, Proceedings of the 36th IEEE Conference on Decision and Control, 1417-1424 vol.2. Online at IEEE.org.
[3]: A. Kandala et al. (2017). Hardware-efficient Variational Quantum Eigensolver for Small Molecules and Quantum Magnets. Nature 549, pages242–246(2017). arXiv:1704.05018v2
- Parameters
maxiter (
int
) – The maximum number of iterations. Note that this is not the maximal number of function evaluations.blocking (
bool
) – If True, only accepts updates that improve the loss (up to some allowed increase, see next argument).allowed_increase (
Optional
[float
]) – Ifblocking
isTrue
, this argument determines by how much the loss can increase with the proposed parameters and still be accepted. IfNone
, the allowed increases is calibrated automatically to be twice the approximated standard deviation of the loss function.trust_region (
bool
) – IfTrue
, restricts the norm of the update step to be \(\leq 1\).learning_rate (
Union
[float
,array
,Callable
[[],Iterator
],None
]) – The update step is the learning rate is multiplied with the gradient. If the learning rate is a float, it remains constant over the course of the optimization. If a NumPy array, the \(i\)-th element is the learning rate for the \(i\)-th iteration. It can also be a callable returning an iterator which yields the learning rates for each optimization step. Iflearning_rate
is setperturbation
must also be provided.perturbation (
Union
[float
,array
,Callable
[[],Iterator
],None
]) – Specifies the magnitude of the perturbation for the finite difference approximation of the gradients. Seelearning_rate
for the supported types. Ifperturbation
is setlearning_rate
must also be provided.last_avg (
int
) – Return the average of thelast_avg
parameters instead of just the last parameter values.resamplings (
Union
[int
,Dict
[int
,int
]]) – The number of times the gradient (and Hessian) is sampled using a random direction to construct a gradient estimate. Per default the gradient is estimated using only one random direction. If an integer, all iterations use the same number of resamplings. If a dictionary, this is interpreted as{iteration: number of resamplings per iteration}
.perturbation_dims (
Optional
[int
]) – The number of perturbed dimensions. Per default, all dimensions are perturbed, but a smaller, fixed number can be perturbed. If set, the perturbed dimensions are chosen uniformly at random.second_order (
bool
) – If True, use 2-SPSA instead of SPSA. In 2-SPSA, the Hessian is estimated additionally to the gradient, and the gradient is preconditioned with the inverse of the Hessian to improve convergence.regularization (
Optional
[float
]) – To ensure the preconditioner is symmetric and positive definite, the identity times a small coefficient is added to it. This generator yields that coefficient.hessian_delay (
int
) – Start multiplying the gradient with the inverse Hessian only after a certain number of iterations. The Hessian is still evaluated and therefore this argument can be useful to first get a stable average over the last iterations before using it as preconditioner.lse_solver (
Optional
[Callable
[[ndarray
,ndarray
],ndarray
]]) – The method to solve for the inverse of the Hessian. Per default an exact LSE solver is used, but can e.g. be overwritten by a minimization routine.initial_hessian (
Optional
[ndarray
]) – The initial guess for the Hessian. By default the identity matrix is used.callback (
Optional
[Callable
[[int
,ndarray
,float
,float
,bool
],None
]]) – A callback function passed information in each iteration step. The information is, in this order: the number of function evaluations, the parameters, the function value, the stepsize, whether the step was accepted.
- Raises
ValueError – If
learning_rate
orperturbation
is an array with less elements than the number of iterations.
-
__init__
(maxiter=100, blocking=False, allowed_increase=None, trust_region=False, learning_rate=None, perturbation=None, last_avg=1, resamplings=1, perturbation_dims=None, second_order=False, regularization=None, hessian_delay=0, lse_solver=None, initial_hessian=None, callback=None)[source]¶ - Parameters
maxiter (
int
) – The maximum number of iterations. Note that this is not the maximal number of function evaluations.blocking (
bool
) – If True, only accepts updates that improve the loss (up to some allowed increase, see next argument).allowed_increase (
Optional
[float
]) – Ifblocking
isTrue
, this argument determines by how much the loss can increase with the proposed parameters and still be accepted. IfNone
, the allowed increases is calibrated automatically to be twice the approximated standard deviation of the loss function.trust_region (
bool
) – IfTrue
, restricts the norm of the update step to be \(\leq 1\).learning_rate (
Union
[float
,array
,Callable
[[],Iterator
],None
]) – The update step is the learning rate is multiplied with the gradient. If the learning rate is a float, it remains constant over the course of the optimization. If a NumPy array, the \(i\)-th element is the learning rate for the \(i\)-th iteration. It can also be a callable returning an iterator which yields the learning rates for each optimization step. Iflearning_rate
is setperturbation
must also be provided.perturbation (
Union
[float
,array
,Callable
[[],Iterator
],None
]) – Specifies the magnitude of the perturbation for the finite difference approximation of the gradients. Seelearning_rate
for the supported types. Ifperturbation
is setlearning_rate
must also be provided.last_avg (
int
) – Return the average of thelast_avg
parameters instead of just the last parameter values.resamplings (
Union
[int
,Dict
[int
,int
]]) – The number of times the gradient (and Hessian) is sampled using a random direction to construct a gradient estimate. Per default the gradient is estimated using only one random direction. If an integer, all iterations use the same number of resamplings. If a dictionary, this is interpreted as{iteration: number of resamplings per iteration}
.perturbation_dims (
Optional
[int
]) – The number of perturbed dimensions. Per default, all dimensions are perturbed, but a smaller, fixed number can be perturbed. If set, the perturbed dimensions are chosen uniformly at random.second_order (
bool
) – If True, use 2-SPSA instead of SPSA. In 2-SPSA, the Hessian is estimated additionally to the gradient, and the gradient is preconditioned with the inverse of the Hessian to improve convergence.regularization (
Optional
[float
]) – To ensure the preconditioner is symmetric and positive definite, the identity times a small coefficient is added to it. This generator yields that coefficient.hessian_delay (
int
) – Start multiplying the gradient with the inverse Hessian only after a certain number of iterations. The Hessian is still evaluated and therefore this argument can be useful to first get a stable average over the last iterations before using it as preconditioner.lse_solver (
Optional
[Callable
[[ndarray
,ndarray
],ndarray
]]) – The method to solve for the inverse of the Hessian. Per default an exact LSE solver is used, but can e.g. be overwritten by a minimization routine.initial_hessian (
Optional
[ndarray
]) – The initial guess for the Hessian. By default the identity matrix is used.callback (
Optional
[Callable
[[int
,ndarray
,float
,float
,bool
],None
]]) – A callback function passed information in each iteration step. The information is, in this order: the number of function evaluations, the parameters, the function value, the stepsize, whether the step was accepted.
- Raises
ValueError – If
learning_rate
orperturbation
is an array with less elements than the number of iterations.
Methods
__init__
([maxiter, blocking, …])- type maxiter
int
calibrate
(loss, initial_point[, c, …])Calibrate SPSA parameters with a powerseries as learning rate and perturbation coeffs.
estimate_stddev
(loss, initial_point[, avg])Estimate the standard deviation of the loss function.
Get the support level dictionary.
gradient_num_diff
(x_center, f, epsilon[, …])We compute the gradient with the numeric differentiation in the parallel way, around the point x_center.
optimize
(num_vars, objective_function[, …])Perform optimization.
Print algorithm-specific options.
set_max_evals_grouped
(limit)Set max evals grouped
set_options
(**kwargs)Sets or updates values in the options dictionary.
wrap_function
(function, args)Wrap the function to implicitly inject the args at the call of the function.
Attributes
Returns bounds support level
Returns gradient support level
Returns initial point support level
Returns is bounds ignored
Returns is bounds required
Returns is bounds supported
Returns is gradient ignored
Returns is gradient required
Returns is gradient supported
Returns is initial point ignored
Returns is initial point required
Returns is initial point supported
Return setting
The optimizer settings in a dictionary format.
-
property
bounds_support_level
¶ Returns bounds support level
-
static
calibrate
(loss, initial_point, c=0.2, stability_constant=0, target_magnitude=None, alpha=0.602, gamma=0.101, modelspace=False)[source]¶ Calibrate SPSA parameters with a powerseries as learning rate and perturbation coeffs.
The powerseries are:
\[a_k = \frac{a}{(A + k + 1)^\alpha}, c_k = \frac{c}{(k + 1)^\gamma}\]- Parameters
loss (
Callable
[[ndarray
],float
]) – The loss function.initial_point (
ndarray
) – The initial guess of the iteration.c (
float
) – The initial perturbation magnitude.stability_constant (
float
) – The value of A.target_magnitude (
Optional
[float
]) – The target magnitude for the first update step, defaults to \(2\pi / 10\).alpha (
float
) – The exponent of the learning rate powerseries.gamma (
float
) – The exponent of the perturbation powerseries.modelspace (
bool
) – Whether the target magnitude is the difference of parameter values or function values (= model space).
- Returns
- A tuple of powerseries generators, the first one for the
learning rate and the second one for the perturbation.
- Return type
tuple(generator, generator)
-
static
estimate_stddev
(loss, initial_point, avg=25)[source]¶ Estimate the standard deviation of the loss function.
- Return type
float
-
static
gradient_num_diff
(x_center, f, epsilon, max_evals_grouped=1)¶ We compute the gradient with the numeric differentiation in the parallel way, around the point x_center.
- Parameters
x_center (ndarray) – point around which we compute the gradient
f (func) – the function of which the gradient is to be computed.
epsilon (float) – the epsilon used in the numeric differentiation.
max_evals_grouped (int) – max evals grouped
- Returns
the gradient computed
- Return type
grad
-
property
gradient_support_level
¶ Returns gradient support level
-
property
initial_point_support_level
¶ Returns initial point support level
-
property
is_bounds_ignored
¶ Returns is bounds ignored
-
property
is_bounds_required
¶ Returns is bounds required
-
property
is_bounds_supported
¶ Returns is bounds supported
-
property
is_gradient_ignored
¶ Returns is gradient ignored
-
property
is_gradient_required
¶ Returns is gradient required
-
property
is_gradient_supported
¶ Returns is gradient supported
-
property
is_initial_point_ignored
¶ Returns is initial point ignored
-
property
is_initial_point_required
¶ Returns is initial point required
-
property
is_initial_point_supported
¶ Returns is initial point supported
-
optimize
(num_vars, objective_function, gradient_function=None, variable_bounds=None, initial_point=None)[source]¶ Perform optimization.
- Parameters
num_vars (int) – Number of parameters to be optimized.
objective_function (callable) – A function that computes the objective function.
gradient_function (callable) – A function that computes the gradient of the objective function, or None if not available.
variable_bounds (list[(float, float)]) – List of variable bounds, given as pairs (lower, upper). None means unbounded.
initial_point (numpy.ndarray[float]) – Initial point.
- Returns
- point, value, nfev
point: is a 1D numpy.ndarray[float] containing the solution value: is a float with the objective function value nfev: number of objective function calls made if available or None
- Raises
ValueError – invalid input
-
print_options
()¶ Print algorithm-specific options.
-
set_max_evals_grouped
(limit)¶ Set max evals grouped
-
set_options
(**kwargs)¶ Sets or updates values in the options dictionary.
The options dictionary may be used internally by a given optimizer to pass additional optional values for the underlying optimizer/optimization function used. The options dictionary may be initially populated with a set of key/values when the given optimizer is constructed.
- Parameters
kwargs (dict) – options, given as name=value.
-
property
setting
¶ Return setting
-
property
settings
¶ The optimizer settings in a dictionary format.
The settings can for instance be used for JSON-serialization (if all settings are serializable, which e.g. doesn’t hold per default for callables), such that the optimizer object can be reconstructed as
settings = optimizer.settings # JSON serialize and send to another server optimizer = OptimizerClass(**settings)
-
static
wrap_function
(function, args)¶ Wrap the function to implicitly inject the args at the call of the function.
- Parameters
function (func) – the target function
args (tuple) – the args to be injected
- Returns
wrapper
- Return type
function_wrapper