mxnet.optimizer.optimizer¶
Base Optimizer class.
Functions
|
Instantiates an optimizer with a given name and kwargs. |
|
Registers a new optimizer. |
Classes
|
The base class inherited by all optimizers. |
|
The Test optimizer |
- class mxnet.optimizer.optimizer.Optimizer(rescale_grad=1.0, param_idx2name=None, wd=0.0, clip_gradient=None, learning_rate=None, lr_scheduler=None, sym=None, begin_num_update=0, multi_precision=False, param_dict=None, aggregate_num=None, use_fused_step=None, **kwargs)[source]¶
Bases:
objectThe base class inherited by all optimizers.
- Parameters:
rescale_grad (float, optional, default 1.0) – Multiply the gradient with rescale_grad before updating. Often choose to be
1.0/batch_size.param_idx2name (dict from int to string, optional, default None) – A dictionary that maps int index to string name.
clip_gradient (float, optional, default None) – Clip the gradient by projecting onto the box
[-clip_gradient, clip_gradient].learning_rate (float) – The initial learning rate. If None, the optimization will use the learning rate from
lr_scheduler. If not None, it will overwrite the learning rate inlr_scheduler. If None andlr_scheduleris also None, then it will be set to 0.01 by default.lr_scheduler (LRScheduler, optional, default None) – The learning rate scheduler.
wd (float, optional, default 0.0) – The weight decay (or L2 regularization) coefficient. Modifies objective by adding a penalty for having large weights.
sym (Symbol, optional, default None) – The Symbol this optimizer is applying to.
begin_num_update (int, optional, default 0) – The initial number of updates.
multi_precision (bool, optional, default False) – Flag to control the internal precision of the optimizer. False: results in using the same precision as the weights (default), True: makes internal 32-bit copy of the weights and applies gradients in 32-bit precision even if actual weights used in the model have lower precision. Turning this on can improve convergence and accuracy when training with float16.
param_dict (dict of int -> gluon.Parameter, default None) – Dictionary of parameter index to gluon.Parameter, used to lookup parameter attributes such as lr_mult, wd_mult, etc. param_dict shall not be deep copied.
aggregate_num (int, optional, default None) – Number of weights to be aggregated in a list. They are passed to the optimizer for a single optimization step. In default, only one weight is aggregated. When aggregate_num is set to numpy.inf, all the weights are aggregated.
use_fused_step (bool, optional, default None) – Whether or not to use fused kernels for optimizer. When use_fused_step=False, step is called, otherwise, fused_step is called.
Properties
----------
learning_rate – The current learning rate of the optimizer. Given an Optimizer object optimizer, its learning rate can be accessed as optimizer.learning_rate.
- static create_optimizer(name, **kwargs)[source]¶
Instantiates an optimizer with a given name and kwargs.
Note
We can use the alias create for
Optimizer.create_optimizer.- Parameters:
- Returns:
An instantiated optimizer.
- Return type:
Examples
>>> sgd = mx.optimizer.Optimizer.create_optimizer('sgd') >>> type(sgd) <class 'mxnet.optimizer.SGD'> >>> adam = mx.optimizer.create('adam', learning_rate=.1) >>> type(adam) <class 'mxnet.optimizer.Adam'>
- create_state(index, weight)[source]¶
Creates auxiliary state for a given weight.
Some optimizers require additional states, e.g. as momentum, in addition to gradients in order to update weights. This function creates state for a given weight which will be used in update. This function is called only once for each weight.
- create_state_multi_precision(index, weight)[source]¶
Creates auxiliary state for a given weight, including FP32 high precision copy if original weight is FP16.
This method is provided to perform automatic mixed precision training for optimizers that do not support it themselves.
- fused_step(indices, weights, grads, states)[source]¶
Perform a fused optimization step using gradients and states. New operators that fuses optimizer’s update should be put in this function.
- Parameters:
indices (list of int) – List of unique indices of the parameters into the individual learning rates and weight decays. Learning rates and weight decay may be set via set_lr_mult() and set_wd_mult(), respectively.
weights (list of NDArray) – List of parameters to be updated.
grads (list of NDArray) – List of gradients of the objective with respect to this parameter.
states (List of any obj) – List of state returned by create_state().
- static register(klass)[source]¶
Registers a new optimizer.
Once an optimizer is registered, we can create an instance of this optimizer with create_optimizer later.
Examples
>>> @mx.optimizer.Optimizer.register ... class MyOptimizer(mx.optimizer.Optimizer): ... pass >>> optim = mx.optimizer.Optimizer.create_optimizer('MyOptimizer') >>> print(type(optim)) <class '__main__.MyOptimizer'>
- set_learning_rate(lr)[source]¶
Sets a new learning rate of the optimizer.
- Parameters:
lr (float) – The new learning rate of the optimizer.
- set_lr_mult(args_lr_mult)[source]¶
Sets an individual learning rate multiplier for each parameter.
If you specify a learning rate multiplier for a parameter, then the learning rate for the parameter will be set as the product of the global learning rate self.lr and its multiplier.
Note
The default learning rate multiplier of a Variable can be set with lr_mult argument in the constructor.
- Parameters:
args_lr_mult (dict of str/int to float) –
For each of its key-value entries, the learning rate multipler for the parameter specified in the key will be set as the given value.
You can specify the parameter with either its name or its index. If you use the name, you should pass sym in the constructor, and the name you specified in the key of args_lr_mult should match the name of the parameter in sym. If you use the index, it should correspond to the index of the parameter used in the update method.
Specifying a parameter by its index is only supported for backward compatibility, and we recommend to use the name instead.
- set_wd_mult(args_wd_mult)[source]¶
Sets an individual weight decay multiplier for each parameter.
Note
The default weight decay multiplier for a Variable can be set with its wd_mult argument in the constructor.
- Parameters:
args_wd_mult (dict of string/int to float) –
For each of its key-value entries, the weight decay multipler for the parameter specified in the key will be set as the given value.
You can specify the parameter with either its name or its index. If you use the name, you should pass sym in the constructor, and the name you specified in the key of args_lr_mult should match the name of the parameter in sym. If you use the index, it should correspond to the index of the parameter used in the update method.
Specifying a parameter by its index is only supported for backward compatibility, and we recommend to use the name instead.
- step(indices, weights, grads, states)[source]¶
Perform an optimization step using gradients and states.
- Parameters:
indices (list of int) – List of unique indices of the parameters into the individual learning rates and weight decays. Learning rates and weight decay may be set via set_lr_mult() and set_wd_mult(), respectively.
weights (list of NDArray) – List of parameters to be updated.
grads (list of NDArray) – List of gradients of the objective with respect to this parameter.
states (List of any obj) – List of state returned by create_state().
- update(indices, weights, grads, states)[source]¶
- Call step to perform a single optimization update if use_fused_step is False,
otherwise fused_step is called.
- Parameters:
indices (list of int) – List of unique indices of the parameters into the individual learning rates and weight decays. Learning rates and weight decay may be set via set_lr_mult() and set_wd_mult(), respectively.
weights (list of NDArray) – List of parameters to be updated.
grads (list of NDArray) – List of gradients of the objective with respect to this parameter.
states (List of any obj) – List of state returned by create_state().
- update_multi_precision(indices, weights, grads, states)[source]¶
- Call step to perform a single optimization update if use_fused_step is False,
otherwise fused_step is called. Mixed precision version.
- Parameters:
indices (list of int) – List of unique indices of the parameters into the individual learning rates and weight decays. Learning rates and weight decay may be set via set_lr_mult() and set_wd_mult(), respectively.
weights (list of NDArray) – List of parameters to be updated.
grads (list of NDArray) – List of gradients of the objective with respect to this parameter.
states (List of any obj) – List of state returned by create_state().
- mxnet.optimizer.optimizer.create(name, **kwargs)¶
Instantiates an optimizer with a given name and kwargs.
Note
We can use the alias create for
Optimizer.create_optimizer.- Parameters:
- Returns:
An instantiated optimizer.
- Return type:
Examples
>>> sgd = mx.optimizer.Optimizer.create_optimizer('sgd') >>> type(sgd) <class 'mxnet.optimizer.SGD'> >>> adam = mx.optimizer.create('adam', learning_rate=.1) >>> type(adam) <class 'mxnet.optimizer.Adam'>
- mxnet.optimizer.optimizer.register(klass)¶
Registers a new optimizer.
Once an optimizer is registered, we can create an instance of this optimizer with create_optimizer later.
Examples
>>> @mx.optimizer.Optimizer.register ... class MyOptimizer(mx.optimizer.Optimizer): ... pass >>> optim = mx.optimizer.Optimizer.create_optimizer('MyOptimizer') >>> print(type(optim)) <class '__main__.MyOptimizer'>