mxnet.gluon.parameter

Neural network parameter.

Classes

Constant(value)

A constant parameter for holding immutable tensors.

Parameter([name, grad_req, shape, dtype, ...])

A Container holding parameters (weights) of Blocks.

Exceptions

DeferredInitializationError

Error for unfinished deferred initialization.

class mxnet.gluon.parameter.Constant(value)[source]

Bases: Parameter

A constant parameter for holding immutable tensors. Constant`s are ignored by `autograd and Trainer, thus their values will not change during training. But you can still update their values manually with the set_data method.

Constant s can be created with either:

const = mx.gluon.Constant([[1,2],[3,4]])

or:

class Block(gluon.Block):
    def __init__(self, **kwargs):
        super(Block, self).__init__(**kwargs)
        self.const = mx.gluon.Constant([[1,2],[3,4]])
Parameters:

value (array-like) – Initial value for the constant.

exception mxnet.gluon.parameter.DeferredInitializationError[source]

Bases: MXNetError

Error for unfinished deferred initialization.

class mxnet.gluon.parameter.Parameter(name='weight', grad_req='write', shape=None, dtype=<class 'numpy.float32'>, lr_mult=1.0, wd_mult=1.0, init=None, allow_deferred_init=False, differentiable=True, stype='default', grad_stype='default')[source]

Bases: object

A Container holding parameters (weights) of Blocks.

Parameter holds a copy of the parameter on each Device after it is initialized with Parameter.initialize(...). If grad_req is not 'null', it will also hold a gradient array on each Device:

device = mx.gpu(0)
x = mx.np.zeros((16, 100), device=device)
w = mx.gluon.Parameter('fc_weight', shape=(64, 100), init=mx.init.Xavier())
b = mx.gluon.Parameter('fc_bias', shape=(64,), init=mx.init.Zero())
w.initialize(device=device)
b.initialize(device=device)
out = mx.npx.fully_connected(x, w.data(device), b.data(device), num_hidden=64)
Parameters:
  • name (str, default 'weight') – Name of this parameter. It decides the corresponding default initializer.

  • grad_req ({'write', 'add', 'null'}, default 'write') –

    Specifies how to update gradient to grad arrays.

    • 'write' means everytime gradient is written to grad NDArray.

    • 'add' means everytime gradient is added to the grad NDArray. You need to manually call zero_grad() to clear the gradient buffer before each iteration when using this option.

    • ’null’ means gradient is not requested for this parameter. gradient arrays will not be allocated.

  • shape (int or tuple of int, default None) – Shape of this parameter. By default shape is not specified. Parameter with unknown shape can be used for Symbol API, but init will throw an error when using NDArray API.

  • dtype (numpy.dtype or str, default 'float32') – Data type of this parameter. For example, numpy.float32 or 'float32'.

  • lr_mult (float, default 1.0) – Learning rate multiplier. Learning rate will be multiplied by lr_mult when updating this parameter with optimizer.

  • wd_mult (float, default 1.0) – Weight decay multiplier (L2 regularizer coefficient). Works similar to lr_mult.

  • init (Initializer, default None) – Initializer of this parameter. Will use the global initializer by default.

  • stype ({'default', 'row_sparse', 'csr'}, defaults to 'default'.) – The storage type of the parameter.

  • grad_stype ({'default', 'row_sparse', 'csr'}, defaults to 'default'.) – The storage type of the parameter’s gradient.

grad_req

This can be set before or after initialization. Setting grad_req to 'null' with x.grad_req = 'null' saves memory and computation when you don’t need gradient w.r.t x.

Type:

{‘write’, ‘add’, ‘null’}

lr_mult

Local learning rate multiplier for this Parameter. The actual learning rate is calculated with learning_rate * lr_mult. You can set it with param.lr_mult = 2.0

Type:

float

wd_mult

Local weight decay multiplier for this Parameter.

Type:

float

cast(dtype)[source]

Cast data and gradient of this Parameter to a new data type.

Parameters:

dtype (str or numpy.dtype) – The new data type.

data(device=None)[source]

Returns a copy of this parameter on one device. Must have been initialized on this device before. For sparse parameters, use Parameter.row_sparse_data() instead.

Parameters:

device (Device) – Desired device.

Return type:

NDArray on device

property dtype

The type of the parameter.

Setting the dtype value is equivalent to casting the value of the parameter

grad(device=None)[source]

Returns a gradient buffer for this parameter on one device.

Parameters:

device (Device) – Desired device.

initialize(init=None, device=None, default_init=<mxnet.initializer.Uniform object>, force_reinit=False)[source]

Initializes parameter and gradient arrays. Only used for NDArray API.

Parameters:
  • init (Initializer) – The initializer to use. Overrides Parameter.init() and default_init.

  • device (Device or list of Device, default device.current_device().) –

    Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.

    Note

    Copies are independent arrays. User is responsible for keeping their values consistent when updating. Normally gluon.Trainer does this for you.

  • default_init (Initializer) – Default initializer is used when both init() and Parameter.init() are None.

  • force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.

Examples

>>> weight = mx.gluon.Parameter('weight', shape=(2, 2))
>>> weight.initialize(device=mx.cpu(0))
>>> weight.data()
[[-0.01068833  0.01729892]
 [ 0.02042518 -0.01618656]]
<NDArray 2x2 @cpu(0)>
>>> weight.grad()
[[ 0.  0.]
 [ 0.  0.]]
<NDArray 2x2 @cpu(0)>
>>> weight.initialize(device=[mx.gpu(0), mx.gpu(1)])
>>> weight.data(mx.gpu(0))
[[-0.00873779 -0.02834515]
 [ 0.05484822 -0.06206018]]
<NDArray 2x2 @gpu(0)>
>>> weight.data(mx.gpu(1))
[[-0.00873779 -0.02834515]
 [ 0.05484822 -0.06206018]]
<NDArray 2x2 @gpu(1)>
list_ctx()[source]

This function has been deprecated. Please refer to Parameter.list_device.

list_data()[source]

Returns copies of this parameter on all devices, in the same order as creation. For sparse parameters, use Parameter.list_row_sparse_data() instead.

Return type:

list of NDArrays

list_device()[source]

Returns a list of devices this parameter is initialized on.

list_grad()[source]

Returns gradient buffers on all devices, in the same order as values().

list_row_sparse_data(row_id)[source]

Returns copies of the ‘row_sparse’ parameter on all devices, in the same order as creation. The copy only retains rows whose ids occur in provided row ids. The parameter must have been initialized before.

Parameters:

row_id (NDArray) – Row ids to retain for the ‘row_sparse’ parameter.

Return type:

list of NDArrays

reset_ctx(ctx)[source]

This function has been deprecated. Please refer to Parameter.reset_device.

reset_device(device)[source]

Re-assign Parameter to other devices.

Parameters:

device (Device or list of Device, default device.current_device().) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.

row_sparse_data(row_id)[source]

Returns a copy of the ‘row_sparse’ parameter on the same device as row_id’s. The copy only retains rows whose ids occur in provided row ids. The parameter must have been initialized on this device before.

Parameters:

row_id (NDArray) – Row ids to retain for the ‘row_sparse’ parameter.

Return type:

NDArray on row_id’s device

set_data(data)[source]

Sets this parameter’s value on all devices.

property shape

The shape of the parameter.

By default, an unknown dimension size is 0. However, when the NumPy semantic is turned on, unknown dimension size is -1.

var()[source]

Returns a symbol representing this parameter.

zero_grad()[source]

Sets gradient buffer on all devices to 0. No action is taken if parameter is uninitialized or doesn’t require gradient.