mxnet.lr_scheduler¶
Scheduling learning rate.
Classes
|
Reduce the learning rate according to a cosine function |
|
Reduce the learning rate by a factor for every n steps. |
|
Base class of a learning rate scheduler. |
|
Reduce the learning rate by given a list of steps. |
|
Reduce the learning rate according to a polynomial of given power. |
- class mxnet.lr_scheduler.CosineScheduler(max_update, base_lr=0.01, final_lr=0, warmup_steps=0, warmup_begin_lr=0, warmup_mode='linear', epoch_size=None)[source]¶
Bases:
LRSchedulerReduce the learning rate according to a cosine function
Calculate the new learning rate by:
final_lr + (start_lr - final_lr) * (1+cos(pi * nup/max_nup))/2 if nup < max_nup, 0 otherwise.
- Parameters:
max_update (int) – maximum number of updates before the decay reaches 0. If
epoch_sizeis supplied, this is the number of epochs over which to decay; otherwise it is a number of minibatch updates.base_lr (float) – base learning rate
final_lr (float) – final learning rate after all steps
warmup_steps (int) – number of warmup steps used before this scheduler starts decay
warmup_begin_lr (float) – if using warmup, the learning rate from which it starts warming up
warmup_mode (string) – warmup can be done in two modes. ‘linear’ mode gradually increases lr with each step in equal increments ‘constant’ mode keeps lr at warmup_begin_lr for warmup_steps
epoch_size (int, optional) – If set,
max_updateis treated as an epoch count and multiplied byepoch_size(typicallylen(train_iter)) internally. This matches the convention used by the d2l book’s PyTorch / JAXCosineSchedulerimplementations, which call.step()once per epoch.
- class mxnet.lr_scheduler.FactorScheduler(step, factor=1, stop_factor_lr=1e-08, base_lr=0.01, warmup_steps=0, warmup_begin_lr=0, warmup_mode='linear', epoch_size=None)[source]¶
Bases:
LRSchedulerReduce the learning rate by a factor for every n steps.
It returns a new learning rate by:
base_lr * pow(factor, floor(num_update/step))
- Parameters:
step (int) – Changes the learning rate for every n updates. If
epoch_sizeis provided,stepis interpreted as a number of epochs and multiplied byepoch_sizeto convert to update steps.factor (float, optional) – The factor to change the learning rate.
stop_factor_lr (float, optional) – Stop updating the learning rate if it is less than this value.
epoch_size (int, optional) – If set, treat
stepas an epoch count and multiply it byepoch_size(typicallylen(train_iter)/ batches per epoch) so that the scheduler advances at epoch granularity even thoughTrainer.step()is called per minibatch. This matches the convention used bytorch.optim.lr_scheduler.MultiStepLRwhen itsstep()is called once per epoch.
- class mxnet.lr_scheduler.LRScheduler(base_lr=0.01, warmup_steps=0, warmup_begin_lr=0, warmup_mode='linear')[source]¶
Bases:
objectBase class of a learning rate scheduler.
A scheduler returns a new learning rate based on the number of updates that have been performed.
- Parameters:
base_lr (float, optional) – The initial learning rate.
warmup_steps (int) – number of warmup steps used before this scheduler starts decay
warmup_begin_lr (float) – if using warmup, the learning rate from which it starts warming up
warmup_mode (string) – warmup can be done in two modes. ‘linear’ mode gradually increases lr with each step in equal increments ‘constant’ mode keeps lr at warmup_begin_lr for warmup_steps
- class mxnet.lr_scheduler.MultiFactorScheduler(step, factor=1, base_lr=0.01, warmup_steps=0, warmup_begin_lr=0, warmup_mode='linear', epoch_size=None)[source]¶
Bases:
LRSchedulerReduce the learning rate by given a list of steps.
Assume there exists k such that:
step[k] <= num_update and num_update < step[k+1]
Then calculate the new learning rate by:
base_lr * pow(factor, k+1)
- Parameters:
step (list of int) – The list of steps to schedule a change. Each value is an update count (i.e. one
trainer.step()invocation increments the counter by one). Ifepoch_sizeis provided, each entry is interpreted as an epoch index and is multiplied byepoch_sizeinternally.factor (float) – The factor to change the learning rate.
warmup_steps (int) – number of warmup steps used before this scheduler starts decay
warmup_begin_lr (float) – if using warmup, the learning rate from which it starts warming up
warmup_mode (string) – warmup can be done in two modes. ‘linear’ mode gradually increases lr with each step in equal increments ‘constant’ mode keeps lr at warmup_begin_lr for warmup_steps
epoch_size (int, optional) – If set, treat each entry of
stepas an epoch index and multiply byepoch_size(typicallylen(train_iter)). This makesMultiFactorScheduler(step=[15, 30], epoch_size=num_batches)behave like PyTorch’sMultiStepLR(milestones=[15, 30])whenscheduler.step()is called once per epoch.
- class mxnet.lr_scheduler.PolyScheduler(max_update, base_lr=0.01, pwr=2, final_lr=0, warmup_steps=0, warmup_begin_lr=0, warmup_mode='linear', epoch_size=None)[source]¶
Bases:
LRSchedulerReduce the learning rate according to a polynomial of given power.
Calculate the new learning rate, after warmup if any, by:
final_lr + (start_lr - final_lr) * (1-nup/max_nup)^pwr if nup < max_nup, 0 otherwise.
- Parameters:
max_update (int) – maximum number of updates before the decay reaches final learning rate.
base_lr (float) – base learning rate to start from
pwr (int) – power of the decay term as a function of the current number of updates.
final_lr (float) – final learning rate after all steps
warmup_steps (int) – number of warmup steps used before this scheduler starts decay
warmup_begin_lr (float) – if using warmup, the learning rate from which it starts warming up
warmup_mode (string) – warmup can be done in two modes. ‘linear’ mode gradually increases lr with each step in equal increments ‘constant’ mode keeps lr at warmup_begin_lr for warmup_steps