mxnet.gluon.rnn.rnn_cell

Definition of various recurrent neural network cells.

Functions

dynamic_unroll(cell, inputs, begin_state[, ...])

Unrolls an RNN cell across time steps.

Classes

BidirectionalCell(l_cell, r_cell)

Bidirectional RNN cell.

DropoutCell(rate[, axes])

Applies dropout on input.

GRUCell(hidden_size[, ...])

Gated Rectified Unit (GRU) network cell.

HybridRecurrentCell()

HybridRecurrentCell supports hybridize.

HybridSequentialRNNCell()

Sequentially stacking multiple HybridRNN cells.

LSTMCell(hidden_size[, ...])

Long-Short Term Memory (LSTM) network cell.

LSTMPCell(hidden_size, projection_size[, ...])

Long-Short Term Memory Projected (LSTMP) network cell.

ModifierCell(base_cell)

Base class for modifier cells.

RNNCell(hidden_size[, activation, ...])

Elman RNN recurrent neural network cell.

RecurrentCell()

Abstract base class for RNN cells

ResidualCell(base_cell)

Adds residual connection as described in Wu et al, 2016 (https://arxiv.org/abs/1609.08144).

SequentialRNNCell()

Sequentially stacking multiple RNN cells.

VariationalDropoutCell(base_cell[, ...])

Applies Variational Dropout on base cell.

ZoneoutCell(base_cell[, zoneout_outputs, ...])

Applies Zoneout on base cell.

class mxnet.gluon.rnn.rnn_cell.BidirectionalCell(l_cell, r_cell)[source]

Bases: HybridRecurrentCell

Bidirectional RNN cell.

Parameters:
apply(fn)

Applies fn recursively to every child block as well as self.

Parameters:

fn (callable) – Function to be applied to each submodule, of form fn(block).

Return type:

this block

begin_state(**kwargs)[source]

Initial state for this cell.

Parameters:
  • func (callable, default symbol.zeros) –

    Function for creating initial state.

    For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.

    For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.

  • batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.

  • **kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.

Returns:

states – Starting states for the first RNN step.

Return type:

nested list of Symbol

cast(dtype)

Cast this Block to use another data type.

Parameters:

dtype (str or numpy.dtype) – The new data type.

collect_params(select=None)

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:

model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')

or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:

model.collect_params('.*weight|.*bias')
Parameters:

select (str) – regular expressions

Return type:

The selected Dict

export(path, epoch=0, remove_amp_cast=True)

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

Note

When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.

Parameters:
  • path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.

  • epoch (int) – Epoch number of saved model.

  • remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.

Returns:

  • symbol_filename (str) – Filename to which model symbols were saved, including path prefix.

  • params_filename (str) – Filename to which model parameters were saved, including path prefix.

forward(x, *args, **kwargs)

Unrolls the recurrent cell for one time step.

Parameters:
  • inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).

  • states (list of sym.Variable) – RNN state from previous step or the output of begin_state().

Returns:

  • output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.

See also

begin_state

This function can provide the states for the first time step.

unroll

This function unrolls an RNN for a given number of (>=1) time steps.

hybridize(active=True, **kwargs)

Please refer description of HybridBlock hybridize().

infer_shape(i, x, is_bidirect)[source]

Infers shape of Parameters from inputs.

infer_type(*args)

Infers data type of Parameters from inputs.

initialize(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)

Initializes Parameter s of this Block and its children.

Parameters:
  • init (Initializer) – Global default Initializer to be used when Parameter.init() is None. Otherwise, Parameter.init() takes precedence.

  • device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).

  • verbose (bool, default False) – Whether to verbosely print out details on initialization.

  • force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.

load(prefix)

Load a model saved using the save API

Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.

This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.

Parameters:

prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params

load_dict(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from dict

Parameters:
  • param_dict (dict) – Dictionary containing model parameters

  • device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

load_parameters(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from file previously saved by save_parameters.

Parameters:
  • filename (str) – Path to parameter file.

  • device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

References

Saving and Loading Gluon Models

optimize_for(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.

Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.

Examples

# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)

# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)

Parameters:
  • x (NDArray) – first input to model

  • *args (NDArray) – other inputs to model

  • backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None

  • backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

  • clear (bool, default False) – clears any previous optimizations

  • partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists

  • static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.

  • static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.

  • inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.

  • forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.

  • backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.

  • **kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

property params

Returns this Block’s parameter dictionary (does not include its children’s parameters).

register_child(block, name=None)

Registers block as a child of self. Block s assigned to self as attributes will be registered automatically.

register_forward_hook(hook)

Registers a forward hook on the block.

The hook function is called immediately after forward(). It should not modify the input or output.

Parameters:

hook (callable) – The forward hook function of form hook(block, input, output) -> None.

Return type:

mxnet.gluon.utils.HookHandle

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

The hook function is called immediately before forward(). It should not modify the input or output.

Parameters:

hook (callable) – The forward hook function of form hook(block, input) -> None.

Return type:

mxnet.gluon.utils.HookHandle

register_op_hook(callback, monitor_all=False)

Install callback monitor.

Parameters:
  • callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).

  • monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.

reset()

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated. Please refer to Block.reset_device.

reset_device(device)

Re-assign all Parameters to other devices.

Parameters:

device (Device or list of Device, default device.current_device().) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.

save(prefix)

Save the model architecture and parameters to load again later

Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.

Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.

Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.

Parameters:

prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params

save_parameters(filename, deduplicate=False)

Save parameters to file.

Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use HybridBlock.export().

Parameters:
  • filename (str) – Path to file.

  • deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.

References

Saving and Loading Gluon Models

setattr(name, value)

Set an attribute to a new value for all Parameters.

For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:

model.setattr('grad_req', 'null')

or change the learning rate multiplier:

model.setattr('lr_mult', 0.5)
Parameters:
  • name (str) – Name of the attribute.

  • value (valid type for attribute name) – The new value for the attribute.

share_parameters(shared)

Share parameters recursively inside the model.

For example, if you want dense1 to share dense0’s weights, you can do:

dense0 = nn.Dense(20)
dense1 = nn.Dense(20)
dense1.share_parameters(dense0.collect_params())
which equals to

dense1.weight = dense0.weight dense1.bias = dense0.bias

Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.

Parameters:

shared (Dict) – Dict of the shared parameters.

Return type:

this block

state_info(batch_size=0)[source]

shape and layout information of states

summary(*inputs)

Print the summary of the model’s output and parameters.

The network must have been initialized, and must not have been hybridized.

Parameters:

inputs (object) – Any input that the model supports. For any tensor in the input, only mxnet.ndarray.NDArray is supported.

unroll(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)[source]

Unrolls an RNN cell across time steps.

Parameters:
  • length (int) – Number of steps to unroll.

  • inputs (Symbol, list of Symbol, or None) –

    If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.

    If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).

  • begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.

  • layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.

  • merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.

  • valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.

Returns:

  • outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

zero_grad()

Sets all Parameters’ gradient buffer to 0.

class mxnet.gluon.rnn.rnn_cell.DropoutCell(rate, axes=())[source]

Bases: HybridRecurrentCell

Applies dropout on input.

Parameters:
  • rate (float) – Percentage of elements to drop out, which is 1 - percentage to retain.

  • axes (tuple of int, default ()) – The axes on which dropout mask is shared. If empty, regular dropout is applied.

Inputs:
  • data: input tensor with shape (batch_size, size).

  • states: a list of recurrent state tensors.

Outputs:
  • out: output tensor with shape (batch_size, size).

  • next_states: returns input states directly.

apply(fn)

Applies fn recursively to every child block as well as self.

Parameters:

fn (callable) – Function to be applied to each submodule, of form fn(block).

Return type:

this block

begin_state(batch_size=0, func=<function zeros>, **kwargs)

Initial state for this cell.

Parameters:
  • func (callable, default symbol.zeros) –

    Function for creating initial state.

    For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.

    For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.

  • batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.

  • **kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.

Returns:

states – Starting states for the first RNN step.

Return type:

nested list of Symbol

cast(dtype)

Cast this Block to use another data type.

Parameters:

dtype (str or numpy.dtype) – The new data type.

collect_params(select=None)

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:

model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')

or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:

model.collect_params('.*weight|.*bias')
Parameters:

select (str) – regular expressions

Return type:

The selected Dict

export(path, epoch=0, remove_amp_cast=True)

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

Note

When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.

Parameters:
  • path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.

  • epoch (int) – Epoch number of saved model.

  • remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.

Returns:

  • symbol_filename (str) – Filename to which model symbols were saved, including path prefix.

  • params_filename (str) – Filename to which model parameters were saved, including path prefix.

forward(inputs, states)[source]

Unrolls the recurrent cell for one time step.

Parameters:
  • inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).

  • states (list of sym.Variable) – RNN state from previous step or the output of begin_state().

Returns:

  • output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.

See also

begin_state

This function can provide the states for the first time step.

unroll

This function unrolls an RNN for a given number of (>=1) time steps.

hybridize(active=True, **kwargs)

Please refer description of HybridBlock hybridize().

infer_shape(*args)

Infers shape of Parameters from inputs.

infer_type(*args)

Infers data type of Parameters from inputs.

initialize(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)

Initializes Parameter s of this Block and its children.

Parameters:
  • init (Initializer) – Global default Initializer to be used when Parameter.init() is None. Otherwise, Parameter.init() takes precedence.

  • device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).

  • verbose (bool, default False) – Whether to verbosely print out details on initialization.

  • force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.

load(prefix)

Load a model saved using the save API

Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.

This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.

Parameters:

prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params

load_dict(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from dict

Parameters:
  • param_dict (dict) – Dictionary containing model parameters

  • device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

load_parameters(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from file previously saved by save_parameters.

Parameters:
  • filename (str) – Path to parameter file.

  • device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

References

Saving and Loading Gluon Models

optimize_for(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.

Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.

Examples

# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)

# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)

Parameters:
  • x (NDArray) – first input to model

  • *args (NDArray) – other inputs to model

  • backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None

  • backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

  • clear (bool, default False) – clears any previous optimizations

  • partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists

  • static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.

  • static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.

  • inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.

  • forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.

  • backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.

  • **kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

property params

Returns this Block’s parameter dictionary (does not include its children’s parameters).

register_child(block, name=None)

Registers block as a child of self. Block s assigned to self as attributes will be registered automatically.

register_forward_hook(hook)

Registers a forward hook on the block.

The hook function is called immediately after forward(). It should not modify the input or output.

Parameters:

hook (callable) – The forward hook function of form hook(block, input, output) -> None.

Return type:

mxnet.gluon.utils.HookHandle

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

The hook function is called immediately before forward(). It should not modify the input or output.

Parameters:

hook (callable) – The forward hook function of form hook(block, input) -> None.

Return type:

mxnet.gluon.utils.HookHandle

register_op_hook(callback, monitor_all=False)

Install callback monitor.

Parameters:
  • callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).

  • monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.

reset()

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated. Please refer to Block.reset_device.

reset_device(device)

Re-assign all Parameters to other devices.

Parameters:

device (Device or list of Device, default device.current_device().) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.

save(prefix)

Save the model architecture and parameters to load again later

Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.

Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.

Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.

Parameters:

prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params

save_parameters(filename, deduplicate=False)

Save parameters to file.

Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use HybridBlock.export().

Parameters:
  • filename (str) – Path to file.

  • deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.

References

Saving and Loading Gluon Models

setattr(name, value)

Set an attribute to a new value for all Parameters.

For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:

model.setattr('grad_req', 'null')

or change the learning rate multiplier:

model.setattr('lr_mult', 0.5)
Parameters:
  • name (str) – Name of the attribute.

  • value (valid type for attribute name) – The new value for the attribute.

share_parameters(shared)

Share parameters recursively inside the model.

For example, if you want dense1 to share dense0’s weights, you can do:

dense0 = nn.Dense(20)
dense1 = nn.Dense(20)
dense1.share_parameters(dense0.collect_params())
which equals to

dense1.weight = dense0.weight dense1.bias = dense0.bias

Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.

Parameters:

shared (Dict) – Dict of the shared parameters.

Return type:

this block

state_info(batch_size=0)[source]

shape and layout information of states

summary(*inputs)

Print the summary of the model’s output and parameters.

The network must have been initialized, and must not have been hybridized.

Parameters:

inputs (object) – Any input that the model supports. For any tensor in the input, only mxnet.ndarray.NDArray is supported.

unroll(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)[source]

Unrolls an RNN cell across time steps.

Parameters:
  • length (int) – Number of steps to unroll.

  • inputs (Symbol, list of Symbol, or None) –

    If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.

    If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).

  • begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.

  • layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.

  • merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.

  • valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.

Returns:

  • outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

zero_grad()

Sets all Parameters’ gradient buffer to 0.

class mxnet.gluon.rnn.rnn_cell.GRUCell(hidden_size, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', input_size=0, activation='tanh', recurrent_activation='sigmoid')[source]

Bases: HybridRecurrentCell

Gated Rectified Unit (GRU) network cell. Note: this is an implementation of the cuDNN version of GRUs (slight modification compared to Cho et al. 2014; the reset gate \(r_t\) is applied after matrix multiplication).

Each call computes the following function:

\[\begin{split}\begin{array}{ll} r_t = sigmoid(W_{ir} x_t + b_{ir} + W_{hr} h_{(t-1)} + b_{hr}) \\ i_t = sigmoid(W_{ii} x_t + b_{ii} + W_{hi} h_{(t-1)} + b_{hi}) \\ n_t = \tanh(W_{in} x_t + b_{in} + r_t * (W_{hn} h_{(t-1)} + b_{hn})) \\ h_t = (1 - i_t) * n_t + i_t * h_{(t-1)} \\ \end{array}\end{split}\]

where \(h_t\) is the hidden state at time t, \(x_t\) is the hidden state of the previous layer at time t or \(input_t\) for the first layer, and \(r_t\), \(i_t\), \(n_t\) are the reset, input, and new gates, respectively.

Parameters:
  • hidden_size (int) – Number of units in output symbol.

  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.

  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.

  • i2h_bias_initializer (str or Initializer, default 'zeros') – Initializer for the bias vector.

  • h2h_bias_initializer (str or Initializer, default 'zeros') – Initializer for the bias vector.

  • input_size (int, default 0) – The number of expected features in the input x. If not specified, it will be inferred from input.

  • activation (str, default 'tanh') – Activation type to use. See nd/symbol Activation for supported types.

  • recurrent_activation (str, default 'sigmoid') – Activation type to use for the recurrent step. See nd/symbol Activation for supported types.

Inputs:
  • data: input tensor with shape (batch_size, input_size).

  • states: a list of one initial recurrent state tensor with shape (batch_size, num_hidden).

Outputs:
  • out: output tensor with shape (batch_size, num_hidden).

  • next_states: a list of one output recurrent state tensor with the same shape as states.

apply(fn)

Applies fn recursively to every child block as well as self.

Parameters:

fn (callable) – Function to be applied to each submodule, of form fn(block).

Return type:

this block

begin_state(batch_size=0, func=<function zeros>, **kwargs)

Initial state for this cell.

Parameters:
  • func (callable, default symbol.zeros) –

    Function for creating initial state.

    For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.

    For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.

  • batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.

  • **kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.

Returns:

states – Starting states for the first RNN step.

Return type:

nested list of Symbol

cast(dtype)

Cast this Block to use another data type.

Parameters:

dtype (str or numpy.dtype) – The new data type.

collect_params(select=None)

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:

model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')

or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:

model.collect_params('.*weight|.*bias')
Parameters:

select (str) – regular expressions

Return type:

The selected Dict

export(path, epoch=0, remove_amp_cast=True)

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

Note

When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.

Parameters:
  • path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.

  • epoch (int) – Epoch number of saved model.

  • remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.

Returns:

  • symbol_filename (str) – Filename to which model symbols were saved, including path prefix.

  • params_filename (str) – Filename to which model parameters were saved, including path prefix.

forward(inputs, states)[source]

Unrolls the recurrent cell for one time step.

Parameters:
  • inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).

  • states (list of sym.Variable) – RNN state from previous step or the output of begin_state().

Returns:

  • output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.

See also

begin_state

This function can provide the states for the first time step.

unroll

This function unrolls an RNN for a given number of (>=1) time steps.

hybridize(active=True, **kwargs)

Please refer description of HybridBlock hybridize().

infer_shape(i, x, is_bidirect)[source]

Infers shape of Parameters from inputs.

infer_type(*args)

Infers data type of Parameters from inputs.

initialize(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)

Initializes Parameter s of this Block and its children.

Parameters:
  • init (Initializer) – Global default Initializer to be used when Parameter.init() is None. Otherwise, Parameter.init() takes precedence.

  • device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).

  • verbose (bool, default False) – Whether to verbosely print out details on initialization.

  • force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.

load(prefix)

Load a model saved using the save API

Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.

This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.

Parameters:

prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params

load_dict(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from dict

Parameters:
  • param_dict (dict) – Dictionary containing model parameters

  • device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

load_parameters(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from file previously saved by save_parameters.

Parameters:
  • filename (str) – Path to parameter file.

  • device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

References

Saving and Loading Gluon Models

optimize_for(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.

Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.

Examples

# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)

# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)

Parameters:
  • x (NDArray) – first input to model

  • *args (NDArray) – other inputs to model

  • backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None

  • backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

  • clear (bool, default False) – clears any previous optimizations

  • partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists

  • static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.

  • static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.

  • inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.

  • forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.

  • backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.

  • **kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

property params

Returns this Block’s parameter dictionary (does not include its children’s parameters).

register_child(block, name=None)

Registers block as a child of self. Block s assigned to self as attributes will be registered automatically.

register_forward_hook(hook)

Registers a forward hook on the block.

The hook function is called immediately after forward(). It should not modify the input or output.

Parameters:

hook (callable) – The forward hook function of form hook(block, input, output) -> None.

Return type:

mxnet.gluon.utils.HookHandle

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

The hook function is called immediately before forward(). It should not modify the input or output.

Parameters:

hook (callable) – The forward hook function of form hook(block, input) -> None.

Return type:

mxnet.gluon.utils.HookHandle

register_op_hook(callback, monitor_all=False)

Install callback monitor.

Parameters:
  • callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).

  • monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.

reset()

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated. Please refer to Block.reset_device.

reset_device(device)

Re-assign all Parameters to other devices.

Parameters:

device (Device or list of Device, default device.current_device().) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.

save(prefix)

Save the model architecture and parameters to load again later

Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.

Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.

Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.

Parameters:

prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params

save_parameters(filename, deduplicate=False)

Save parameters to file.

Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use HybridBlock.export().

Parameters:
  • filename (str) – Path to file.

  • deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.

References

Saving and Loading Gluon Models

setattr(name, value)

Set an attribute to a new value for all Parameters.

For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:

model.setattr('grad_req', 'null')

or change the learning rate multiplier:

model.setattr('lr_mult', 0.5)
Parameters:
  • name (str) – Name of the attribute.

  • value (valid type for attribute name) – The new value for the attribute.

share_parameters(shared)

Share parameters recursively inside the model.

For example, if you want dense1 to share dense0’s weights, you can do:

dense0 = nn.Dense(20)
dense1 = nn.Dense(20)
dense1.share_parameters(dense0.collect_params())
which equals to

dense1.weight = dense0.weight dense1.bias = dense0.bias

Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.

Parameters:

shared (Dict) – Dict of the shared parameters.

Return type:

this block

state_info(batch_size=0)[source]

shape and layout information of states

summary(*inputs)

Print the summary of the model’s output and parameters.

The network must have been initialized, and must not have been hybridized.

Parameters:

inputs (object) – Any input that the model supports. For any tensor in the input, only mxnet.ndarray.NDArray is supported.

unroll(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)

Unrolls an RNN cell across time steps.

Parameters:
  • length (int) – Number of steps to unroll.

  • inputs (Symbol, list of Symbol, or None) –

    If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.

    If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).

  • begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.

  • layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.

  • merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.

  • valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.

Returns:

  • outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

zero_grad()

Sets all Parameters’ gradient buffer to 0.

class mxnet.gluon.rnn.rnn_cell.HybridRecurrentCell[source]

Bases: RecurrentCell, HybridBlock

HybridRecurrentCell supports hybridize.

apply(fn)

Applies fn recursively to every child block as well as self.

Parameters:

fn (callable) – Function to be applied to each submodule, of form fn(block).

Return type:

this block

begin_state(batch_size=0, func=<function zeros>, **kwargs)

Initial state for this cell.

Parameters:
  • func (callable, default symbol.zeros) –

    Function for creating initial state.

    For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.

    For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.

  • batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.

  • **kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.

Returns:

states – Starting states for the first RNN step.

Return type:

nested list of Symbol

cast(dtype)

Cast this Block to use another data type.

Parameters:

dtype (str or numpy.dtype) – The new data type.

collect_params(select=None)

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:

model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')

or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:

model.collect_params('.*weight|.*bias')
Parameters:

select (str) – regular expressions

Return type:

The selected Dict

export(path, epoch=0, remove_amp_cast=True)

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

Note

When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.

Parameters:
  • path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.

  • epoch (int) – Epoch number of saved model.

  • remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.

Returns:

  • symbol_filename (str) – Filename to which model symbols were saved, including path prefix.

  • params_filename (str) – Filename to which model parameters were saved, including path prefix.

forward(x, *args, **kwargs)[source]

Unrolls the recurrent cell for one time step.

Parameters:
  • inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).

  • states (list of sym.Variable) – RNN state from previous step or the output of begin_state().

Returns:

  • output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.

See also

begin_state

This function can provide the states for the first time step.

unroll

This function unrolls an RNN for a given number of (>=1) time steps.

hybridize(active=True, **kwargs)

Please refer description of HybridBlock hybridize().

infer_shape(*args)

Infers shape of Parameters from inputs.

infer_type(*args)

Infers data type of Parameters from inputs.

initialize(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)

Initializes Parameter s of this Block and its children.

Parameters:
  • init (Initializer) – Global default Initializer to be used when Parameter.init() is None. Otherwise, Parameter.init() takes precedence.

  • device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).

  • verbose (bool, default False) – Whether to verbosely print out details on initialization.

  • force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.

load(prefix)

Load a model saved using the save API

Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.

This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.

Parameters:

prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params

load_dict(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from dict

Parameters:
  • param_dict (dict) – Dictionary containing model parameters

  • device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

load_parameters(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from file previously saved by save_parameters.

Parameters:
  • filename (str) – Path to parameter file.

  • device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

References

Saving and Loading Gluon Models

optimize_for(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.

Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.

Examples

# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)

# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)

Parameters:
  • x (NDArray) – first input to model

  • *args (NDArray) – other inputs to model

  • backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None

  • backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

  • clear (bool, default False) – clears any previous optimizations

  • partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists

  • static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.

  • static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.

  • inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.

  • forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.

  • backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.

  • **kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

property params

Returns this Block’s parameter dictionary (does not include its children’s parameters).

register_child(block, name=None)

Registers block as a child of self. Block s assigned to self as attributes will be registered automatically.

register_forward_hook(hook)

Registers a forward hook on the block.

The hook function is called immediately after forward(). It should not modify the input or output.

Parameters:

hook (callable) – The forward hook function of form hook(block, input, output) -> None.

Return type:

mxnet.gluon.utils.HookHandle

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

The hook function is called immediately before forward(). It should not modify the input or output.

Parameters:

hook (callable) – The forward hook function of form hook(block, input) -> None.

Return type:

mxnet.gluon.utils.HookHandle

register_op_hook(callback, monitor_all=False)

Install callback monitor.

Parameters:
  • callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).

  • monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.

reset()

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated. Please refer to Block.reset_device.

reset_device(device)

Re-assign all Parameters to other devices.

Parameters:

device (Device or list of Device, default device.current_device().) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.

save(prefix)

Save the model architecture and parameters to load again later

Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.

Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.

Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.

Parameters:

prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params

save_parameters(filename, deduplicate=False)

Save parameters to file.

Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use HybridBlock.export().

Parameters:
  • filename (str) – Path to file.

  • deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.

References

Saving and Loading Gluon Models

setattr(name, value)

Set an attribute to a new value for all Parameters.

For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:

model.setattr('grad_req', 'null')

or change the learning rate multiplier:

model.setattr('lr_mult', 0.5)
Parameters:
  • name (str) – Name of the attribute.

  • value (valid type for attribute name) – The new value for the attribute.

share_parameters(shared)

Share parameters recursively inside the model.

For example, if you want dense1 to share dense0’s weights, you can do:

dense0 = nn.Dense(20)
dense1 = nn.Dense(20)
dense1.share_parameters(dense0.collect_params())
which equals to

dense1.weight = dense0.weight dense1.bias = dense0.bias

Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.

Parameters:

shared (Dict) – Dict of the shared parameters.

Return type:

this block

state_info(batch_size=0)

shape and layout information of states

summary(*inputs)

Print the summary of the model’s output and parameters.

The network must have been initialized, and must not have been hybridized.

Parameters:

inputs (object) – Any input that the model supports. For any tensor in the input, only mxnet.ndarray.NDArray is supported.

unroll(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)

Unrolls an RNN cell across time steps.

Parameters:
  • length (int) – Number of steps to unroll.

  • inputs (Symbol, list of Symbol, or None) –

    If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.

    If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).

  • begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.

  • layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.

  • merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.

  • valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.

Returns:

  • outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

zero_grad()

Sets all Parameters’ gradient buffer to 0.

class mxnet.gluon.rnn.rnn_cell.HybridSequentialRNNCell[source]

Bases: HybridRecurrentCell

Sequentially stacking multiple HybridRNN cells.

add(cell)[source]

Appends a cell into the stack.

Parameters:

cell (RecurrentCell) – The cell to add.

apply(fn)

Applies fn recursively to every child block as well as self.

Parameters:

fn (callable) – Function to be applied to each submodule, of form fn(block).

Return type:

this block

begin_state(**kwargs)[source]

Initial state for this cell.

Parameters:
  • func (callable, default symbol.zeros) –

    Function for creating initial state.

    For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.

    For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.

  • batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.

  • **kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.

Returns:

states – Starting states for the first RNN step.

Return type:

nested list of Symbol

cast(dtype)

Cast this Block to use another data type.

Parameters:

dtype (str or numpy.dtype) – The new data type.

collect_params(select=None)

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:

model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')

or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:

model.collect_params('.*weight|.*bias')
Parameters:

select (str) – regular expressions

Return type:

The selected Dict

export(path, epoch=0, remove_amp_cast=True)

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

Note

When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.

Parameters:
  • path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.

  • epoch (int) – Epoch number of saved model.

  • remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.

Returns:

  • symbol_filename (str) – Filename to which model symbols were saved, including path prefix.

  • params_filename (str) – Filename to which model parameters were saved, including path prefix.

forward(inputs, states)[source]

Unrolls the recurrent cell for one time step.

Parameters:
  • inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).

  • states (list of sym.Variable) – RNN state from previous step or the output of begin_state().

Returns:

  • output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.

See also

begin_state

This function can provide the states for the first time step.

unroll

This function unrolls an RNN for a given number of (>=1) time steps.

hybridize(active=True, **kwargs)

Please refer description of HybridBlock hybridize().

infer_shape(_, x, is_bidirect)[source]

Infers shape of Parameters from inputs.

infer_type(*args)

Infers data type of Parameters from inputs.

initialize(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)

Initializes Parameter s of this Block and its children.

Parameters:
  • init (Initializer) – Global default Initializer to be used when Parameter.init() is None. Otherwise, Parameter.init() takes precedence.

  • device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).

  • verbose (bool, default False) – Whether to verbosely print out details on initialization.

  • force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.

load(prefix)

Load a model saved using the save API

Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.

This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.

Parameters:

prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params

load_dict(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from dict

Parameters:
  • param_dict (dict) – Dictionary containing model parameters

  • device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

load_parameters(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from file previously saved by save_parameters.

Parameters:
  • filename (str) – Path to parameter file.

  • device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

References

Saving and Loading Gluon Models

optimize_for(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.

Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.

Examples

# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)

# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)

Parameters:
  • x (NDArray) – first input to model

  • *args (NDArray) – other inputs to model

  • backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None

  • backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

  • clear (bool, default False) – clears any previous optimizations

  • partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists

  • static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.

  • static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.

  • inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.

  • forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.

  • backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.

  • **kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

property params

Returns this Block’s parameter dictionary (does not include its children’s parameters).

register_child(block, name=None)

Registers block as a child of self. Block s assigned to self as attributes will be registered automatically.

register_forward_hook(hook)

Registers a forward hook on the block.

The hook function is called immediately after forward(). It should not modify the input or output.

Parameters:

hook (callable) – The forward hook function of form hook(block, input, output) -> None.

Return type:

mxnet.gluon.utils.HookHandle

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

The hook function is called immediately before forward(). It should not modify the input or output.

Parameters:

hook (callable) – The forward hook function of form hook(block, input) -> None.

Return type:

mxnet.gluon.utils.HookHandle

register_op_hook(callback, monitor_all=False)

Install callback monitor.

Parameters:
  • callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).

  • monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.

reset()

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated. Please refer to Block.reset_device.

reset_device(device)

Re-assign all Parameters to other devices.

Parameters:

device (Device or list of Device, default device.current_device().) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.

save(prefix)

Save the model architecture and parameters to load again later

Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.

Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.

Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.

Parameters:

prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params

save_parameters(filename, deduplicate=False)

Save parameters to file.

Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use HybridBlock.export().

Parameters:
  • filename (str) – Path to file.

  • deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.

References

Saving and Loading Gluon Models

setattr(name, value)

Set an attribute to a new value for all Parameters.

For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:

model.setattr('grad_req', 'null')

or change the learning rate multiplier:

model.setattr('lr_mult', 0.5)
Parameters:
  • name (str) – Name of the attribute.

  • value (valid type for attribute name) – The new value for the attribute.

share_parameters(shared)

Share parameters recursively inside the model.

For example, if you want dense1 to share dense0’s weights, you can do:

dense0 = nn.Dense(20)
dense1 = nn.Dense(20)
dense1.share_parameters(dense0.collect_params())
which equals to

dense1.weight = dense0.weight dense1.bias = dense0.bias

Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.

Parameters:

shared (Dict) – Dict of the shared parameters.

Return type:

this block

state_info(batch_size=0)[source]

shape and layout information of states

summary(*inputs)

Print the summary of the model’s output and parameters.

The network must have been initialized, and must not have been hybridized.

Parameters:

inputs (object) – Any input that the model supports. For any tensor in the input, only mxnet.ndarray.NDArray is supported.

unroll(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)[source]

Unrolls an RNN cell across time steps.

Parameters:
  • length (int) – Number of steps to unroll.

  • inputs (Symbol, list of Symbol, or None) –

    If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.

    If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).

  • begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.

  • layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.

  • merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.

  • valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.

Returns:

  • outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

zero_grad()

Sets all Parameters’ gradient buffer to 0.

class mxnet.gluon.rnn.rnn_cell.LSTMCell(hidden_size, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', input_size=0, activation='tanh', recurrent_activation='sigmoid')[source]

Bases: HybridRecurrentCell

Long-Short Term Memory (LSTM) network cell.

Each call computes the following function:

\[\begin{split}\begin{array}{ll} i_t = sigmoid(W_{ii} x_t + b_{ii} + W_{hi} h_{(t-1)} + b_{hi}) \\ f_t = sigmoid(W_{if} x_t + b_{if} + W_{hf} h_{(t-1)} + b_{hf}) \\ g_t = \tanh(W_{ig} x_t + b_{ig} + W_{hc} h_{(t-1)} + b_{hg}) \\ o_t = sigmoid(W_{io} x_t + b_{io} + W_{ho} h_{(t-1)} + b_{ho}) \\ c_t = f_t * c_{(t-1)} + i_t * g_t \\ h_t = o_t * \tanh(c_t) \end{array}\end{split}\]

where \(h_t\) is the hidden state at time t, \(c_t\) is the cell state at time t, \(x_t\) is the hidden state of the previous layer at time t or \(input_t\) for the first layer, and \(i_t\), \(f_t\), \(g_t\), \(o_t\) are the input, forget, cell, and out gates, respectively.

Parameters:
  • hidden_size (int) – Number of units in output symbol.

  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.

  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.

  • i2h_bias_initializer (str or Initializer, default 'zeros') – Initializer for the bias vector.

  • h2h_bias_initializer (str or Initializer, default 'zeros') – Initializer for the bias vector.

  • input_size (int, default 0) – The number of expected features in the input x. If not specified, it will be inferred from input.

  • activation (str, default 'tanh') – Activation type to use. See nd/symbol Activation for supported types.

  • recurrent_activation (str, default 'sigmoid') – Activation type to use for the recurrent step. See nd/symbol Activation for supported types.

  • Inputs

    • data: input tensor with shape (batch_size, input_size).

    • states: a list of two initial recurrent state tensors. Each has shape (batch_size, num_hidden).

  • Outputs

    • out: output tensor with shape (batch_size, num_hidden).

    • next_states: a list of two output recurrent state tensors. Each has the same shape as states.

apply(fn)

Applies fn recursively to every child block as well as self.

Parameters:

fn (callable) – Function to be applied to each submodule, of form fn(block).

Return type:

this block

begin_state(batch_size=0, func=<function zeros>, **kwargs)

Initial state for this cell.

Parameters:
  • func (callable, default symbol.zeros) –

    Function for creating initial state.

    For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.

    For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.

  • batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.

  • **kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.

Returns:

states – Starting states for the first RNN step.

Return type:

nested list of Symbol

cast(dtype)

Cast this Block to use another data type.

Parameters:

dtype (str or numpy.dtype) – The new data type.

collect_params(select=None)

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:

model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')

or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:

model.collect_params('.*weight|.*bias')
Parameters:

select (str) – regular expressions

Return type:

The selected Dict

export(path, epoch=0, remove_amp_cast=True)

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

Note

When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.

Parameters:
  • path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.

  • epoch (int) – Epoch number of saved model.

  • remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.

Returns:

  • symbol_filename (str) – Filename to which model symbols were saved, including path prefix.

  • params_filename (str) – Filename to which model parameters were saved, including path prefix.

forward(inputs, states)[source]

Unrolls the recurrent cell for one time step.

Parameters:
  • inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).

  • states (list of sym.Variable) – RNN state from previous step or the output of begin_state().

Returns:

  • output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.

See also

begin_state

This function can provide the states for the first time step.

unroll

This function unrolls an RNN for a given number of (>=1) time steps.

hybridize(active=True, **kwargs)

Please refer description of HybridBlock hybridize().

infer_shape(i, x, is_bidirect)[source]

Infers shape of Parameters from inputs.

infer_type(*args)

Infers data type of Parameters from inputs.

initialize(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)

Initializes Parameter s of this Block and its children.

Parameters:
  • init (Initializer) – Global default Initializer to be used when Parameter.init() is None. Otherwise, Parameter.init() takes precedence.

  • device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).

  • verbose (bool, default False) – Whether to verbosely print out details on initialization.

  • force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.

load(prefix)

Load a model saved using the save API

Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.

This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.

Parameters:

prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params

load_dict(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from dict

Parameters:
  • param_dict (dict) – Dictionary containing model parameters

  • device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

load_parameters(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from file previously saved by save_parameters.

Parameters:
  • filename (str) – Path to parameter file.

  • device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

References

Saving and Loading Gluon Models

optimize_for(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.

Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.

Examples

# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)

# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)

Parameters:
  • x (NDArray) – first input to model

  • *args (NDArray) – other inputs to model

  • backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None

  • backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

  • clear (bool, default False) – clears any previous optimizations

  • partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists

  • static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.

  • static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.

  • inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.

  • forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.

  • backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.

  • **kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

property params

Returns this Block’s parameter dictionary (does not include its children’s parameters).

register_child(block, name=None)

Registers block as a child of self. Block s assigned to self as attributes will be registered automatically.

register_forward_hook(hook)

Registers a forward hook on the block.

The hook function is called immediately after forward(). It should not modify the input or output.

Parameters:

hook (callable) – The forward hook function of form hook(block, input, output) -> None.

Return type:

mxnet.gluon.utils.HookHandle

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

The hook function is called immediately before forward(). It should not modify the input or output.

Parameters:

hook (callable) – The forward hook function of form hook(block, input) -> None.

Return type:

mxnet.gluon.utils.HookHandle

register_op_hook(callback, monitor_all=False)

Install callback monitor.

Parameters:
  • callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).

  • monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.

reset()

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated. Please refer to Block.reset_device.

reset_device(device)

Re-assign all Parameters to other devices.

Parameters:

device (Device or list of Device, default device.current_device().) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.

save(prefix)

Save the model architecture and parameters to load again later

Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.

Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.

Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.

Parameters:

prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params

save_parameters(filename, deduplicate=False)

Save parameters to file.

Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use HybridBlock.export().

Parameters:
  • filename (str) – Path to file.

  • deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.

References

Saving and Loading Gluon Models

setattr(name, value)

Set an attribute to a new value for all Parameters.

For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:

model.setattr('grad_req', 'null')

or change the learning rate multiplier:

model.setattr('lr_mult', 0.5)
Parameters:
  • name (str) – Name of the attribute.

  • value (valid type for attribute name) – The new value for the attribute.

share_parameters(shared)

Share parameters recursively inside the model.

For example, if you want dense1 to share dense0’s weights, you can do:

dense0 = nn.Dense(20)
dense1 = nn.Dense(20)
dense1.share_parameters(dense0.collect_params())
which equals to

dense1.weight = dense0.weight dense1.bias = dense0.bias

Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.

Parameters:

shared (Dict) – Dict of the shared parameters.

Return type:

this block

state_info(batch_size=0)[source]

shape and layout information of states

summary(*inputs)

Print the summary of the model’s output and parameters.

The network must have been initialized, and must not have been hybridized.

Parameters:

inputs (object) – Any input that the model supports. For any tensor in the input, only mxnet.ndarray.NDArray is supported.

unroll(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)

Unrolls an RNN cell across time steps.

Parameters:
  • length (int) – Number of steps to unroll.

  • inputs (Symbol, list of Symbol, or None) –

    If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.

    If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).

  • begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.

  • layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.

  • merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.

  • valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.

Returns:

  • outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

zero_grad()

Sets all Parameters’ gradient buffer to 0.

class mxnet.gluon.rnn.rnn_cell.LSTMPCell(hidden_size, projection_size, i2h_weight_initializer=None, h2h_weight_initializer=None, h2r_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', input_size=0)[source]

Bases: HybridRecurrentCell

Long-Short Term Memory Projected (LSTMP) network cell. (https://arxiv.org/abs/1402.1128)

Each call computes the following function:

\[\begin{split}\begin{array}{ll} i_t = sigmoid(W_{ii} x_t + b_{ii} + W_{ri} r_{(t-1)} + b_{ri}) \\ f_t = sigmoid(W_{if} x_t + b_{if} + W_{rf} r_{(t-1)} + b_{rf}) \\ g_t = \tanh(W_{ig} x_t + b_{ig} + W_{rc} r_{(t-1)} + b_{rg}) \\ o_t = sigmoid(W_{io} x_t + b_{io} + W_{ro} r_{(t-1)} + b_{ro}) \\ c_t = f_t * c_{(t-1)} + i_t * g_t \\ h_t = o_t * \tanh(c_t) \\ r_t = W_{hr} h_t \end{array}\end{split}\]

where \(r_t\) is the projected recurrent activation at time t, \(h_t\) is the hidden state at time t, \(c_t\) is the cell state at time t, \(x_t\) is the input at time t, and \(i_t\), \(f_t\), \(g_t\), \(o_t\) are the input, forget, cell, and out gates, respectively.

Parameters:
  • hidden_size (int) – Number of units in cell state symbol.

  • projection_size (int) – Number of units in output symbol.

  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.

  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the hidden state.

  • h2r_weight_initializer (str or Initializer) – Initializer for the projection weights matrix, used for the linear transformation of the recurrent state.

  • i2h_bias_initializer (str or Initializer, default 'lstmbias') – Initializer for the bias vector. By default, bias for the forget gate is initialized to 1 while all other biases are initialized to zero.

  • h2h_bias_initializer (str or Initializer) – Initializer for the bias vector.

  • Inputs

    • data: input tensor with shape (batch_size, input_size).

    • states: a list of two initial recurrent state tensors, with shape (batch_size, projection_size) and (batch_size, hidden_size) respectively.

  • Outputs

    • out: output tensor with shape (batch_size, num_hidden).

    • next_states: a list of two output recurrent state tensors. Each has the same shape as states.

apply(fn)

Applies fn recursively to every child block as well as self.

Parameters:

fn (callable) – Function to be applied to each submodule, of form fn(block).

Return type:

this block

begin_state(batch_size=0, func=<function zeros>, **kwargs)

Initial state for this cell.

Parameters:
  • func (callable, default symbol.zeros) –

    Function for creating initial state.

    For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.

    For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.

  • batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.

  • **kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.

Returns:

states – Starting states for the first RNN step.

Return type:

nested list of Symbol

cast(dtype)

Cast this Block to use another data type.

Parameters:

dtype (str or numpy.dtype) – The new data type.

collect_params(select=None)

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:

model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')

or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:

model.collect_params('.*weight|.*bias')
Parameters:

select (str) – regular expressions

Return type:

The selected Dict

export(path, epoch=0, remove_amp_cast=True)

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

Note

When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.

Parameters:
  • path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.

  • epoch (int) – Epoch number of saved model.

  • remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.

Returns:

  • symbol_filename (str) – Filename to which model symbols were saved, including path prefix.

  • params_filename (str) – Filename to which model parameters were saved, including path prefix.

forward(inputs, states)[source]

Unrolls the recurrent cell for one time step.

Parameters:
  • inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).

  • states (list of sym.Variable) – RNN state from previous step or the output of begin_state().

Returns:

  • output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.

See also

begin_state

This function can provide the states for the first time step.

unroll

This function unrolls an RNN for a given number of (>=1) time steps.

hybridize(active=True, **kwargs)

Please refer description of HybridBlock hybridize().

infer_shape(i, x, is_bidirect)[source]

Infers shape of Parameters from inputs.

infer_type(*args)

Infers data type of Parameters from inputs.

initialize(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)

Initializes Parameter s of this Block and its children.

Parameters:
  • init (Initializer) – Global default Initializer to be used when Parameter.init() is None. Otherwise, Parameter.init() takes precedence.

  • device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).

  • verbose (bool, default False) – Whether to verbosely print out details on initialization.

  • force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.

load(prefix)

Load a model saved using the save API

Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.

This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.

Parameters:

prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params

load_dict(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from dict

Parameters:
  • param_dict (dict) – Dictionary containing model parameters

  • device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

load_parameters(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from file previously saved by save_parameters.

Parameters:
  • filename (str) – Path to parameter file.

  • device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

References

Saving and Loading Gluon Models

optimize_for(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.

Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.

Examples

# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)

# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)

Parameters:
  • x (NDArray) – first input to model

  • *args (NDArray) – other inputs to model

  • backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None

  • backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

  • clear (bool, default False) – clears any previous optimizations

  • partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists

  • static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.

  • static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.

  • inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.

  • forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.

  • backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.

  • **kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

property params

Returns this Block’s parameter dictionary (does not include its children’s parameters).

register_child(block, name=None)

Registers block as a child of self. Block s assigned to self as attributes will be registered automatically.

register_forward_hook(hook)

Registers a forward hook on the block.

The hook function is called immediately after forward(). It should not modify the input or output.

Parameters:

hook (callable) – The forward hook function of form hook(block, input, output) -> None.

Return type:

mxnet.gluon.utils.HookHandle

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

The hook function is called immediately before forward(). It should not modify the input or output.

Parameters:

hook (callable) – The forward hook function of form hook(block, input) -> None.

Return type:

mxnet.gluon.utils.HookHandle

register_op_hook(callback, monitor_all=False)

Install callback monitor.

Parameters:
  • callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).

  • monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.

reset()

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated. Please refer to Block.reset_device.

reset_device(device)

Re-assign all Parameters to other devices.

Parameters:

device (Device or list of Device, default device.current_device().) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.

save(prefix)

Save the model architecture and parameters to load again later

Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.

Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.

Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.

Parameters:

prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params

save_parameters(filename, deduplicate=False)

Save parameters to file.

Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use HybridBlock.export().

Parameters:
  • filename (str) – Path to file.

  • deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.

References

Saving and Loading Gluon Models

setattr(name, value)

Set an attribute to a new value for all Parameters.

For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:

model.setattr('grad_req', 'null')

or change the learning rate multiplier:

model.setattr('lr_mult', 0.5)
Parameters:
  • name (str) – Name of the attribute.

  • value (valid type for attribute name) – The new value for the attribute.

share_parameters(shared)

Share parameters recursively inside the model.

For example, if you want dense1 to share dense0’s weights, you can do:

dense0 = nn.Dense(20)
dense1 = nn.Dense(20)
dense1.share_parameters(dense0.collect_params())
which equals to

dense1.weight = dense0.weight dense1.bias = dense0.bias

Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.

Parameters:

shared (Dict) – Dict of the shared parameters.

Return type:

this block

state_info(batch_size=0)[source]

shape and layout information of states

summary(*inputs)

Print the summary of the model’s output and parameters.

The network must have been initialized, and must not have been hybridized.

Parameters:

inputs (object) – Any input that the model supports. For any tensor in the input, only mxnet.ndarray.NDArray is supported.

unroll(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)

Unrolls an RNN cell across time steps.

Parameters:
  • length (int) – Number of steps to unroll.

  • inputs (Symbol, list of Symbol, or None) –

    If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.

    If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).

  • begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.

  • layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.

  • merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.

  • valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.

Returns:

  • outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

zero_grad()

Sets all Parameters’ gradient buffer to 0.

class mxnet.gluon.rnn.rnn_cell.ModifierCell(base_cell)[source]

Bases: HybridRecurrentCell

Base class for modifier cells. A modifier cell takes a base cell, apply modifications on it (e.g. Zoneout), and returns a new cell.

After applying modifiers the base cell should no longer be called directly. The modifier cell should be used instead.

apply(fn)

Applies fn recursively to every child block as well as self.

Parameters:

fn (callable) – Function to be applied to each submodule, of form fn(block).

Return type:

this block

begin_state(func=<function zeros>, **kwargs)[source]

Initial state for this cell.

Parameters:
  • func (callable, default symbol.zeros) –

    Function for creating initial state.

    For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.

    For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.

  • batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.

  • **kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.

Returns:

states – Starting states for the first RNN step.

Return type:

nested list of Symbol

cast(dtype)

Cast this Block to use another data type.

Parameters:

dtype (str or numpy.dtype) – The new data type.

collect_params(select=None)

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:

model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')

or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:

model.collect_params('.*weight|.*bias')
Parameters:

select (str) – regular expressions

Return type:

The selected Dict

export(path, epoch=0, remove_amp_cast=True)

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

Note

When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.

Parameters:
  • path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.

  • epoch (int) – Epoch number of saved model.

  • remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.

Returns:

  • symbol_filename (str) – Filename to which model symbols were saved, including path prefix.

  • params_filename (str) – Filename to which model parameters were saved, including path prefix.

forward(inputs, states)[source]

Unrolls the recurrent cell for one time step.

Parameters:
  • inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).

  • states (list of sym.Variable) – RNN state from previous step or the output of begin_state().

Returns:

  • output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.

See also

begin_state

This function can provide the states for the first time step.

unroll

This function unrolls an RNN for a given number of (>=1) time steps.

hybridize(active=True, **kwargs)

Please refer description of HybridBlock hybridize().

infer_shape(*args)

Infers shape of Parameters from inputs.

infer_type(*args)

Infers data type of Parameters from inputs.

initialize(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)

Initializes Parameter s of this Block and its children.

Parameters:
  • init (Initializer) – Global default Initializer to be used when Parameter.init() is None. Otherwise, Parameter.init() takes precedence.

  • device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).

  • verbose (bool, default False) – Whether to verbosely print out details on initialization.

  • force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.

load(prefix)

Load a model saved using the save API

Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.

This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.

Parameters:

prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params

load_dict(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from dict

Parameters:
  • param_dict (dict) – Dictionary containing model parameters

  • device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

load_parameters(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from file previously saved by save_parameters.

Parameters:
  • filename (str) – Path to parameter file.

  • device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

References

Saving and Loading Gluon Models

optimize_for(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.

Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.

Examples

# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)

# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)

Parameters:
  • x (NDArray) – first input to model

  • *args (NDArray) – other inputs to model

  • backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None

  • backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

  • clear (bool, default False) – clears any previous optimizations

  • partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists

  • static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.

  • static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.

  • inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.

  • forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.

  • backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.

  • **kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

property params

Return an attribute of instance, which is of type owner.

register_child(block, name=None)

Registers block as a child of self. Block s assigned to self as attributes will be registered automatically.

register_forward_hook(hook)

Registers a forward hook on the block.

The hook function is called immediately after forward(). It should not modify the input or output.

Parameters:

hook (callable) – The forward hook function of form hook(block, input, output) -> None.

Return type:

mxnet.gluon.utils.HookHandle

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

The hook function is called immediately before forward(). It should not modify the input or output.

Parameters:

hook (callable) – The forward hook function of form hook(block, input) -> None.

Return type:

mxnet.gluon.utils.HookHandle

register_op_hook(callback, monitor_all=False)

Install callback monitor.

Parameters:
  • callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).

  • monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.

reset()

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated. Please refer to Block.reset_device.

reset_device(device)

Re-assign all Parameters to other devices.

Parameters:

device (Device or list of Device, default device.current_device().) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.

save(prefix)

Save the model architecture and parameters to load again later

Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.

Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.

Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.

Parameters:

prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params

save_parameters(filename, deduplicate=False)

Save parameters to file.

Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use HybridBlock.export().

Parameters:
  • filename (str) – Path to file.

  • deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.

References

Saving and Loading Gluon Models

setattr(name, value)

Set an attribute to a new value for all Parameters.

For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:

model.setattr('grad_req', 'null')

or change the learning rate multiplier:

model.setattr('lr_mult', 0.5)
Parameters:
  • name (str) – Name of the attribute.

  • value (valid type for attribute name) – The new value for the attribute.

share_parameters(shared)

Share parameters recursively inside the model.

For example, if you want dense1 to share dense0’s weights, you can do:

dense0 = nn.Dense(20)
dense1 = nn.Dense(20)
dense1.share_parameters(dense0.collect_params())
which equals to

dense1.weight = dense0.weight dense1.bias = dense0.bias

Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.

Parameters:

shared (Dict) – Dict of the shared parameters.

Return type:

this block

state_info(batch_size=0)[source]

shape and layout information of states

summary(*inputs)

Print the summary of the model’s output and parameters.

The network must have been initialized, and must not have been hybridized.

Parameters:

inputs (object) – Any input that the model supports. For any tensor in the input, only mxnet.ndarray.NDArray is supported.

unroll(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)

Unrolls an RNN cell across time steps.

Parameters:
  • length (int) – Number of steps to unroll.

  • inputs (Symbol, list of Symbol, or None) –

    If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.

    If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).

  • begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.

  • layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.

  • merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.

  • valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.

Returns:

  • outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

zero_grad()

Sets all Parameters’ gradient buffer to 0.

class mxnet.gluon.rnn.rnn_cell.RNNCell(hidden_size, activation='tanh', i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', input_size=0)[source]

Bases: HybridRecurrentCell

Elman RNN recurrent neural network cell.

Each call computes the following function:

\[h_t = \tanh(w_{ih} * x_t + b_{ih} + w_{hh} * h_{(t-1)} + b_{hh})\]

where \(h_t\) is the hidden state at time t, and \(x_t\) is the hidden state of the previous layer at time t or \(input_t\) for the first layer. If nonlinearity=’relu’, then ReLU is used instead of tanh.

Parameters:
  • hidden_size (int) – Number of units in output symbol

  • activation (str or Symbol, default 'tanh') – Type of activation function.

  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.

  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.

  • i2h_bias_initializer (str or Initializer, default 'zeros') – Initializer for the bias vector.

  • h2h_bias_initializer (str or Initializer, default 'zeros') – Initializer for the bias vector.

  • input_size (int, default 0) – The number of expected features in the input x. If not specified, it will be inferred from input.

Inputs:
  • data: input tensor with shape (batch_size, input_size).

  • states: a list of one initial recurrent state tensor with shape (batch_size, num_hidden).

Outputs:
  • out: output tensor with shape (batch_size, num_hidden).

  • next_states: a list of one output recurrent state tensor with the same shape as states.

apply(fn)

Applies fn recursively to every child block as well as self.

Parameters:

fn (callable) – Function to be applied to each submodule, of form fn(block).

Return type:

this block

begin_state(batch_size=0, func=<function zeros>, **kwargs)

Initial state for this cell.

Parameters:
  • func (callable, default symbol.zeros) –

    Function for creating initial state.

    For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.

    For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.

  • batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.

  • **kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.

Returns:

states – Starting states for the first RNN step.

Return type:

nested list of Symbol

cast(dtype)

Cast this Block to use another data type.

Parameters:

dtype (str or numpy.dtype) – The new data type.

collect_params(select=None)

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:

model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')

or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:

model.collect_params('.*weight|.*bias')
Parameters:

select (str) – regular expressions

Return type:

The selected Dict

export(path, epoch=0, remove_amp_cast=True)

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

Note

When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.

Parameters:
  • path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.

  • epoch (int) – Epoch number of saved model.

  • remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.

Returns:

  • symbol_filename (str) – Filename to which model symbols were saved, including path prefix.

  • params_filename (str) – Filename to which model parameters were saved, including path prefix.

forward(inputs, states)[source]

Unrolls the recurrent cell for one time step.

Parameters:
  • inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).

  • states (list of sym.Variable) – RNN state from previous step or the output of begin_state().

Returns:

  • output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.

See also

begin_state

This function can provide the states for the first time step.

unroll

This function unrolls an RNN for a given number of (>=1) time steps.

hybridize(active=True, **kwargs)

Please refer description of HybridBlock hybridize().

infer_shape(i, x, is_bidirect)[source]

Infers shape of Parameters from inputs.

infer_type(*args)

Infers data type of Parameters from inputs.

initialize(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)

Initializes Parameter s of this Block and its children.

Parameters:
  • init (Initializer) – Global default Initializer to be used when Parameter.init() is None. Otherwise, Parameter.init() takes precedence.

  • device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).

  • verbose (bool, default False) – Whether to verbosely print out details on initialization.

  • force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.

load(prefix)

Load a model saved using the save API

Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.

This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.

Parameters:

prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params

load_dict(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from dict

Parameters:
  • param_dict (dict) – Dictionary containing model parameters

  • device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

load_parameters(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from file previously saved by save_parameters.

Parameters:
  • filename (str) – Path to parameter file.

  • device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

References

Saving and Loading Gluon Models

optimize_for(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.

Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.

Examples

# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)

# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)

Parameters:
  • x (NDArray) – first input to model

  • *args (NDArray) – other inputs to model

  • backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None

  • backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

  • clear (bool, default False) – clears any previous optimizations

  • partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists

  • static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.

  • static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.

  • inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.

  • forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.

  • backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.

  • **kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

property params

Returns this Block’s parameter dictionary (does not include its children’s parameters).

register_child(block, name=None)

Registers block as a child of self. Block s assigned to self as attributes will be registered automatically.

register_forward_hook(hook)

Registers a forward hook on the block.

The hook function is called immediately after forward(). It should not modify the input or output.

Parameters:

hook (callable) – The forward hook function of form hook(block, input, output) -> None.

Return type:

mxnet.gluon.utils.HookHandle

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

The hook function is called immediately before forward(). It should not modify the input or output.

Parameters:

hook (callable) – The forward hook function of form hook(block, input) -> None.

Return type:

mxnet.gluon.utils.HookHandle

register_op_hook(callback, monitor_all=False)

Install callback monitor.

Parameters:
  • callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).

  • monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.

reset()

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated. Please refer to Block.reset_device.

reset_device(device)

Re-assign all Parameters to other devices.

Parameters:

device (Device or list of Device, default device.current_device().) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.

save(prefix)

Save the model architecture and parameters to load again later

Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.

Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.

Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.

Parameters:

prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params

save_parameters(filename, deduplicate=False)

Save parameters to file.

Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use HybridBlock.export().

Parameters:
  • filename (str) – Path to file.

  • deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.

References

Saving and Loading Gluon Models

setattr(name, value)

Set an attribute to a new value for all Parameters.

For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:

model.setattr('grad_req', 'null')

or change the learning rate multiplier:

model.setattr('lr_mult', 0.5)
Parameters:
  • name (str) – Name of the attribute.

  • value (valid type for attribute name) – The new value for the attribute.

share_parameters(shared)

Share parameters recursively inside the model.

For example, if you want dense1 to share dense0’s weights, you can do:

dense0 = nn.Dense(20)
dense1 = nn.Dense(20)
dense1.share_parameters(dense0.collect_params())
which equals to

dense1.weight = dense0.weight dense1.bias = dense0.bias

Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.

Parameters:

shared (Dict) – Dict of the shared parameters.

Return type:

this block

state_info(batch_size=0)[source]

shape and layout information of states

summary(*inputs)

Print the summary of the model’s output and parameters.

The network must have been initialized, and must not have been hybridized.

Parameters:

inputs (object) – Any input that the model supports. For any tensor in the input, only mxnet.ndarray.NDArray is supported.

unroll(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)

Unrolls an RNN cell across time steps.

Parameters:
  • length (int) – Number of steps to unroll.

  • inputs (Symbol, list of Symbol, or None) –

    If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.

    If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).

  • begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.

  • layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.

  • merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.

  • valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.

Returns:

  • outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

zero_grad()

Sets all Parameters’ gradient buffer to 0.

class mxnet.gluon.rnn.rnn_cell.RecurrentCell[source]

Bases: Block

Abstract base class for RNN cells

apply(fn)

Applies fn recursively to every child block as well as self.

Parameters:

fn (callable) – Function to be applied to each submodule, of form fn(block).

Return type:

this block

begin_state(batch_size=0, func=<function zeros>, **kwargs)[source]

Initial state for this cell.

Parameters:
  • func (callable, default symbol.zeros) –

    Function for creating initial state.

    For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.

    For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.

  • batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.

  • **kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.

Returns:

states – Starting states for the first RNN step.

Return type:

nested list of Symbol

cast(dtype)

Cast this Block to use another data type.

Parameters:

dtype (str or numpy.dtype) – The new data type.

collect_params(select=None)

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:

model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')

or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:

model.collect_params('.*weight|.*bias')
Parameters:

select (str) – regular expressions

Return type:

The selected Dict

forward(inputs, states)[source]

Unrolls the recurrent cell for one time step.

Parameters:
  • inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).

  • states (list of sym.Variable) – RNN state from previous step or the output of begin_state().

Returns:

  • output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.

See also

begin_state

This function can provide the states for the first time step.

unroll

This function unrolls an RNN for a given number of (>=1) time steps.

hybridize(active=True, **kwargs)

Please refer description of HybridBlock hybridize().

initialize(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)

Initializes Parameter s of this Block and its children.

Parameters:
  • init (Initializer) – Global default Initializer to be used when Parameter.init() is None. Otherwise, Parameter.init() takes precedence.

  • device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).

  • verbose (bool, default False) – Whether to verbosely print out details on initialization.

  • force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.

load(prefix)

Load a model saved using the save API

Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.

This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.

Parameters:

prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params

load_dict(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from dict

Parameters:
  • param_dict (dict) – Dictionary containing model parameters

  • device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

load_parameters(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from file previously saved by save_parameters.

Parameters:
  • filename (str) – Path to parameter file.

  • device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

References

Saving and Loading Gluon Models

property params

Returns this Block’s parameter dictionary (does not include its children’s parameters).

register_child(block, name=None)

Registers block as a child of self. Block s assigned to self as attributes will be registered automatically.

register_forward_hook(hook)

Registers a forward hook on the block.

The hook function is called immediately after forward(). It should not modify the input or output.

Parameters:

hook (callable) – The forward hook function of form hook(block, input, output) -> None.

Return type:

mxnet.gluon.utils.HookHandle

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

The hook function is called immediately before forward(). It should not modify the input or output.

Parameters:

hook (callable) – The forward hook function of form hook(block, input) -> None.

Return type:

mxnet.gluon.utils.HookHandle

register_op_hook(callback, monitor_all=False)

Install callback monitor.

Parameters:
  • callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).

  • monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.

reset()[source]

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated. Please refer to Block.reset_device.

reset_device(device)

Re-assign all Parameters to other devices.

Parameters:

device (Device or list of Device, default device.current_device().) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.

save(prefix)

Save the model architecture and parameters to load again later

Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.

Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.

Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.

Parameters:

prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params

save_parameters(filename, deduplicate=False)

Save parameters to file.

Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use HybridBlock.export().

Parameters:
  • filename (str) – Path to file.

  • deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.

References

Saving and Loading Gluon Models

setattr(name, value)

Set an attribute to a new value for all Parameters.

For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:

model.setattr('grad_req', 'null')

or change the learning rate multiplier:

model.setattr('lr_mult', 0.5)
Parameters:
  • name (str) – Name of the attribute.

  • value (valid type for attribute name) – The new value for the attribute.

share_parameters(shared)

Share parameters recursively inside the model.

For example, if you want dense1 to share dense0’s weights, you can do:

dense0 = nn.Dense(20)
dense1 = nn.Dense(20)
dense1.share_parameters(dense0.collect_params())
which equals to

dense1.weight = dense0.weight dense1.bias = dense0.bias

Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.

Parameters:

shared (Dict) – Dict of the shared parameters.

Return type:

this block

state_info(batch_size=0)[source]

shape and layout information of states

summary(*inputs)

Print the summary of the model’s output and parameters.

The network must have been initialized, and must not have been hybridized.

Parameters:

inputs (object) – Any input that the model supports. For any tensor in the input, only mxnet.ndarray.NDArray is supported.

unroll(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)[source]

Unrolls an RNN cell across time steps.

Parameters:
  • length (int) – Number of steps to unroll.

  • inputs (Symbol, list of Symbol, or None) –

    If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.

    If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).

  • begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.

  • layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.

  • merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.

  • valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.

Returns:

  • outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

zero_grad()

Sets all Parameters’ gradient buffer to 0.

class mxnet.gluon.rnn.rnn_cell.ResidualCell(base_cell)[source]

Bases: ModifierCell

Adds residual connection as described in Wu et al, 2016 (https://arxiv.org/abs/1609.08144). Output of the cell is output of the base cell plus input.

apply(fn)

Applies fn recursively to every child block as well as self.

Parameters:

fn (callable) – Function to be applied to each submodule, of form fn(block).

Return type:

this block

begin_state(func=<function zeros>, **kwargs)

Initial state for this cell.

Parameters:
  • func (callable, default symbol.zeros) –

    Function for creating initial state.

    For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.

    For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.

  • batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.

  • **kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.

Returns:

states – Starting states for the first RNN step.

Return type:

nested list of Symbol

cast(dtype)

Cast this Block to use another data type.

Parameters:

dtype (str or numpy.dtype) – The new data type.

collect_params(select=None)

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:

model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')

or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:

model.collect_params('.*weight|.*bias')
Parameters:

select (str) – regular expressions

Return type:

The selected Dict

export(path, epoch=0, remove_amp_cast=True)

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

Note

When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.

Parameters:
  • path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.

  • epoch (int) – Epoch number of saved model.

  • remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.

Returns:

  • symbol_filename (str) – Filename to which model symbols were saved, including path prefix.

  • params_filename (str) – Filename to which model parameters were saved, including path prefix.

forward(inputs, states)[source]

Unrolls the recurrent cell for one time step.

Parameters:
  • inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).

  • states (list of sym.Variable) – RNN state from previous step or the output of begin_state().

Returns:

  • output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.

See also

begin_state

This function can provide the states for the first time step.

unroll

This function unrolls an RNN for a given number of (>=1) time steps.

hybridize(active=True, **kwargs)

Please refer description of HybridBlock hybridize().

infer_shape(i, x, is_bidirect)[source]

Infers shape of Parameters from inputs.

infer_type(*args)

Infers data type of Parameters from inputs.

initialize(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)

Initializes Parameter s of this Block and its children.

Parameters:
  • init (Initializer) – Global default Initializer to be used when Parameter.init() is None. Otherwise, Parameter.init() takes precedence.

  • device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).

  • verbose (bool, default False) – Whether to verbosely print out details on initialization.

  • force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.

load(prefix)

Load a model saved using the save API

Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.

This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.

Parameters:

prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params

load_dict(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from dict

Parameters:
  • param_dict (dict) – Dictionary containing model parameters

  • device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

load_parameters(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from file previously saved by save_parameters.

Parameters:
  • filename (str) – Path to parameter file.

  • device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

References

Saving and Loading Gluon Models

optimize_for(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.

Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.

Examples

# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)

# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)

Parameters:
  • x (NDArray) – first input to model

  • *args (NDArray) – other inputs to model

  • backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None

  • backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

  • clear (bool, default False) – clears any previous optimizations

  • partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists

  • static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.

  • static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.

  • inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.

  • forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.

  • backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.

  • **kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

property params

Return an attribute of instance, which is of type owner.

register_child(block, name=None)

Registers block as a child of self. Block s assigned to self as attributes will be registered automatically.

register_forward_hook(hook)

Registers a forward hook on the block.

The hook function is called immediately after forward(). It should not modify the input or output.

Parameters:

hook (callable) – The forward hook function of form hook(block, input, output) -> None.

Return type:

mxnet.gluon.utils.HookHandle

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

The hook function is called immediately before forward(). It should not modify the input or output.

Parameters:

hook (callable) – The forward hook function of form hook(block, input) -> None.

Return type:

mxnet.gluon.utils.HookHandle

register_op_hook(callback, monitor_all=False)

Install callback monitor.

Parameters:
  • callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).

  • monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.

reset()

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated. Please refer to Block.reset_device.

reset_device(device)

Re-assign all Parameters to other devices.

Parameters:

device (Device or list of Device, default device.current_device().) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.

save(prefix)

Save the model architecture and parameters to load again later

Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.

Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.

Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.

Parameters:

prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params

save_parameters(filename, deduplicate=False)

Save parameters to file.

Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use HybridBlock.export().

Parameters:
  • filename (str) – Path to file.

  • deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.

References

Saving and Loading Gluon Models

setattr(name, value)

Set an attribute to a new value for all Parameters.

For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:

model.setattr('grad_req', 'null')

or change the learning rate multiplier:

model.setattr('lr_mult', 0.5)
Parameters:
  • name (str) – Name of the attribute.

  • value (valid type for attribute name) – The new value for the attribute.

share_parameters(shared)

Share parameters recursively inside the model.

For example, if you want dense1 to share dense0’s weights, you can do:

dense0 = nn.Dense(20)
dense1 = nn.Dense(20)
dense1.share_parameters(dense0.collect_params())
which equals to

dense1.weight = dense0.weight dense1.bias = dense0.bias

Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.

Parameters:

shared (Dict) – Dict of the shared parameters.

Return type:

this block

state_info(batch_size=0)

shape and layout information of states

summary(*inputs)

Print the summary of the model’s output and parameters.

The network must have been initialized, and must not have been hybridized.

Parameters:

inputs (object) – Any input that the model supports. For any tensor in the input, only mxnet.ndarray.NDArray is supported.

unroll(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)[source]

Unrolls an RNN cell across time steps.

Parameters:
  • length (int) – Number of steps to unroll.

  • inputs (Symbol, list of Symbol, or None) –

    If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.

    If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).

  • begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.

  • layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.

  • merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.

  • valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.

Returns:

  • outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

zero_grad()

Sets all Parameters’ gradient buffer to 0.

class mxnet.gluon.rnn.rnn_cell.SequentialRNNCell[source]

Bases: RecurrentCell

Sequentially stacking multiple RNN cells.

add(cell)[source]

Appends a cell into the stack.

Parameters:

cell (RecurrentCell) – The cell to add.

apply(fn)

Applies fn recursively to every child block as well as self.

Parameters:

fn (callable) – Function to be applied to each submodule, of form fn(block).

Return type:

this block

begin_state(**kwargs)[source]

Initial state for this cell.

Parameters:
  • func (callable, default symbol.zeros) –

    Function for creating initial state.

    For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.

    For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.

  • batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.

  • **kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.

Returns:

states – Starting states for the first RNN step.

Return type:

nested list of Symbol

cast(dtype)

Cast this Block to use another data type.

Parameters:

dtype (str or numpy.dtype) – The new data type.

collect_params(select=None)

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:

model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')

or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:

model.collect_params('.*weight|.*bias')
Parameters:

select (str) – regular expressions

Return type:

The selected Dict

forward(*args, **kwargs)[source]

Unrolls the recurrent cell for one time step.

Parameters:
  • inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).

  • states (list of sym.Variable) – RNN state from previous step or the output of begin_state().

Returns:

  • output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.

See also

begin_state

This function can provide the states for the first time step.

unroll

This function unrolls an RNN for a given number of (>=1) time steps.

hybridize(active=True, **kwargs)

Please refer description of HybridBlock hybridize().

initialize(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)

Initializes Parameter s of this Block and its children.

Parameters:
  • init (Initializer) – Global default Initializer to be used when Parameter.init() is None. Otherwise, Parameter.init() takes precedence.

  • device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).

  • verbose (bool, default False) – Whether to verbosely print out details on initialization.

  • force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.

load(prefix)

Load a model saved using the save API

Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.

This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.

Parameters:

prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params

load_dict(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from dict

Parameters:
  • param_dict (dict) – Dictionary containing model parameters

  • device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

load_parameters(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from file previously saved by save_parameters.

Parameters:
  • filename (str) – Path to parameter file.

  • device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

References

Saving and Loading Gluon Models

property params

Returns this Block’s parameter dictionary (does not include its children’s parameters).

register_child(block, name=None)

Registers block as a child of self. Block s assigned to self as attributes will be registered automatically.

register_forward_hook(hook)

Registers a forward hook on the block.

The hook function is called immediately after forward(). It should not modify the input or output.

Parameters:

hook (callable) – The forward hook function of form hook(block, input, output) -> None.

Return type:

mxnet.gluon.utils.HookHandle

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

The hook function is called immediately before forward(). It should not modify the input or output.

Parameters:

hook (callable) – The forward hook function of form hook(block, input) -> None.

Return type:

mxnet.gluon.utils.HookHandle

register_op_hook(callback, monitor_all=False)

Install callback monitor.

Parameters:
  • callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).

  • monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.

reset()

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated. Please refer to Block.reset_device.

reset_device(device)

Re-assign all Parameters to other devices.

Parameters:

device (Device or list of Device, default device.current_device().) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.

save(prefix)

Save the model architecture and parameters to load again later

Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.

Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.

Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.

Parameters:

prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params

save_parameters(filename, deduplicate=False)

Save parameters to file.

Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use HybridBlock.export().

Parameters:
  • filename (str) – Path to file.

  • deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.

References

Saving and Loading Gluon Models

setattr(name, value)

Set an attribute to a new value for all Parameters.

For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:

model.setattr('grad_req', 'null')

or change the learning rate multiplier:

model.setattr('lr_mult', 0.5)
Parameters:
  • name (str) – Name of the attribute.

  • value (valid type for attribute name) – The new value for the attribute.

share_parameters(shared)

Share parameters recursively inside the model.

For example, if you want dense1 to share dense0’s weights, you can do:

dense0 = nn.Dense(20)
dense1 = nn.Dense(20)
dense1.share_parameters(dense0.collect_params())
which equals to

dense1.weight = dense0.weight dense1.bias = dense0.bias

Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.

Parameters:

shared (Dict) – Dict of the shared parameters.

Return type:

this block

state_info(batch_size=0)[source]

shape and layout information of states

summary(*inputs)

Print the summary of the model’s output and parameters.

The network must have been initialized, and must not have been hybridized.

Parameters:

inputs (object) – Any input that the model supports. For any tensor in the input, only mxnet.ndarray.NDArray is supported.

unroll(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)[source]

Unrolls an RNN cell across time steps.

Parameters:
  • length (int) – Number of steps to unroll.

  • inputs (Symbol, list of Symbol, or None) –

    If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.

    If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).

  • begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.

  • layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.

  • merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.

  • valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.

Returns:

  • outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

zero_grad()

Sets all Parameters’ gradient buffer to 0.

class mxnet.gluon.rnn.rnn_cell.VariationalDropoutCell(base_cell, drop_inputs=0.0, drop_states=0.0, drop_outputs=0.0)[source]

Bases: ModifierCell

Applies Variational Dropout on base cell. https://arxiv.org/pdf/1512.05287.pdf

Variational dropout uses the same dropout mask across time-steps. It can be applied to RNN inputs, outputs, and states. The masks for them are not shared.

The dropout mask is initialized when stepping forward for the first time and will remain the same until .reset() is called. Thus, if using the cell and stepping manually without calling .unroll(), the .reset() should be called after each sequence.

Parameters:
  • base_cell (RecurrentCell) – The cell on which to perform variational dropout.

  • drop_inputs (float, default 0.) – The dropout rate for inputs. Won’t apply dropout if it equals 0.

  • drop_states (float, default 0.) – The dropout rate for state inputs on the first state channel. Won’t apply dropout if it equals 0.

  • drop_outputs (float, default 0.) – The dropout rate for outputs. Won’t apply dropout if it equals 0.

apply(fn)

Applies fn recursively to every child block as well as self.

Parameters:

fn (callable) – Function to be applied to each submodule, of form fn(block).

Return type:

this block

begin_state(func=<function zeros>, **kwargs)

Initial state for this cell.

Parameters:
  • func (callable, default symbol.zeros) –

    Function for creating initial state.

    For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.

    For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.

  • batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.

  • **kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.

Returns:

states – Starting states for the first RNN step.

Return type:

nested list of Symbol

cast(dtype)

Cast this Block to use another data type.

Parameters:

dtype (str or numpy.dtype) – The new data type.

collect_params(select=None)

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:

model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')

or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:

model.collect_params('.*weight|.*bias')
Parameters:

select (str) – regular expressions

Return type:

The selected Dict

export(path, epoch=0, remove_amp_cast=True)

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

Note

When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.

Parameters:
  • path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.

  • epoch (int) – Epoch number of saved model.

  • remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.

Returns:

  • symbol_filename (str) – Filename to which model symbols were saved, including path prefix.

  • params_filename (str) – Filename to which model parameters were saved, including path prefix.

forward(inputs, states)[source]

Unrolls the recurrent cell for one time step.

Parameters:
  • inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).

  • states (list of sym.Variable) – RNN state from previous step or the output of begin_state().

Returns:

  • output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.

See also

begin_state

This function can provide the states for the first time step.

unroll

This function unrolls an RNN for a given number of (>=1) time steps.

hybridize(active=True, **kwargs)

Please refer description of HybridBlock hybridize().

infer_shape(i, x, is_bidirect)[source]

Infers shape of Parameters from inputs.

infer_type(*args)

Infers data type of Parameters from inputs.

initialize(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)

Initializes Parameter s of this Block and its children.

Parameters:
  • init (Initializer) – Global default Initializer to be used when Parameter.init() is None. Otherwise, Parameter.init() takes precedence.

  • device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).

  • verbose (bool, default False) – Whether to verbosely print out details on initialization.

  • force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.

load(prefix)

Load a model saved using the save API

Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.

This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.

Parameters:

prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params

load_dict(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from dict

Parameters:
  • param_dict (dict) – Dictionary containing model parameters

  • device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

load_parameters(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from file previously saved by save_parameters.

Parameters:
  • filename (str) – Path to parameter file.

  • device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

References

Saving and Loading Gluon Models

optimize_for(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.

Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.

Examples

# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)

# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)

Parameters:
  • x (NDArray) – first input to model

  • *args (NDArray) – other inputs to model

  • backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None

  • backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

  • clear (bool, default False) – clears any previous optimizations

  • partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists

  • static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.

  • static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.

  • inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.

  • forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.

  • backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.

  • **kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

property params

Return an attribute of instance, which is of type owner.

register_child(block, name=None)

Registers block as a child of self. Block s assigned to self as attributes will be registered automatically.

register_forward_hook(hook)

Registers a forward hook on the block.

The hook function is called immediately after forward(). It should not modify the input or output.

Parameters:

hook (callable) – The forward hook function of form hook(block, input, output) -> None.

Return type:

mxnet.gluon.utils.HookHandle

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

The hook function is called immediately before forward(). It should not modify the input or output.

Parameters:

hook (callable) – The forward hook function of form hook(block, input) -> None.

Return type:

mxnet.gluon.utils.HookHandle

register_op_hook(callback, monitor_all=False)

Install callback monitor.

Parameters:
  • callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).

  • monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.

reset()[source]

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated. Please refer to Block.reset_device.

reset_device(device)

Re-assign all Parameters to other devices.

Parameters:

device (Device or list of Device, default device.current_device().) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.

save(prefix)

Save the model architecture and parameters to load again later

Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.

Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.

Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.

Parameters:

prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params

save_parameters(filename, deduplicate=False)

Save parameters to file.

Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use HybridBlock.export().

Parameters:
  • filename (str) – Path to file.

  • deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.

References

Saving and Loading Gluon Models

setattr(name, value)

Set an attribute to a new value for all Parameters.

For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:

model.setattr('grad_req', 'null')

or change the learning rate multiplier:

model.setattr('lr_mult', 0.5)
Parameters:
  • name (str) – Name of the attribute.

  • value (valid type for attribute name) – The new value for the attribute.

share_parameters(shared)

Share parameters recursively inside the model.

For example, if you want dense1 to share dense0’s weights, you can do:

dense0 = nn.Dense(20)
dense1 = nn.Dense(20)
dense1.share_parameters(dense0.collect_params())
which equals to

dense1.weight = dense0.weight dense1.bias = dense0.bias

Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.

Parameters:

shared (Dict) – Dict of the shared parameters.

Return type:

this block

state_info(batch_size=0)

shape and layout information of states

summary(*inputs)

Print the summary of the model’s output and parameters.

The network must have been initialized, and must not have been hybridized.

Parameters:

inputs (object) – Any input that the model supports. For any tensor in the input, only mxnet.ndarray.NDArray is supported.

unroll(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)[source]

Unrolls an RNN cell across time steps.

Parameters:
  • length (int) – Number of steps to unroll.

  • inputs (Symbol, list of Symbol, or None) –

    If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.

    If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).

  • begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.

  • layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.

  • merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.

  • valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.

Returns:

  • outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

zero_grad()

Sets all Parameters’ gradient buffer to 0.

class mxnet.gluon.rnn.rnn_cell.ZoneoutCell(base_cell, zoneout_outputs=0.0, zoneout_states=0.0)[source]

Bases: ModifierCell

Applies Zoneout on base cell.

apply(fn)

Applies fn recursively to every child block as well as self.

Parameters:

fn (callable) – Function to be applied to each submodule, of form fn(block).

Return type:

this block

begin_state(func=<function zeros>, **kwargs)

Initial state for this cell.

Parameters:
  • func (callable, default symbol.zeros) –

    Function for creating initial state.

    For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.

    For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.

  • batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.

  • **kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.

Returns:

states – Starting states for the first RNN step.

Return type:

nested list of Symbol

cast(dtype)

Cast this Block to use another data type.

Parameters:

dtype (str or numpy.dtype) – The new data type.

collect_params(select=None)

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:

model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')

or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:

model.collect_params('.*weight|.*bias')
Parameters:

select (str) – regular expressions

Return type:

The selected Dict

export(path, epoch=0, remove_amp_cast=True)

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

Note

When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.

Parameters:
  • path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.

  • epoch (int) – Epoch number of saved model.

  • remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.

Returns:

  • symbol_filename (str) – Filename to which model symbols were saved, including path prefix.

  • params_filename (str) – Filename to which model parameters were saved, including path prefix.

forward(inputs, states)[source]

Unrolls the recurrent cell for one time step.

Parameters:
  • inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).

  • states (list of sym.Variable) – RNN state from previous step or the output of begin_state().

Returns:

  • output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.

See also

begin_state

This function can provide the states for the first time step.

unroll

This function unrolls an RNN for a given number of (>=1) time steps.

hybridize(active=True, **kwargs)

Please refer description of HybridBlock hybridize().

infer_shape(i, x, is_bidirect)[source]

Infers shape of Parameters from inputs.

infer_type(*args)

Infers data type of Parameters from inputs.

initialize(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)

Initializes Parameter s of this Block and its children.

Parameters:
  • init (Initializer) – Global default Initializer to be used when Parameter.init() is None. Otherwise, Parameter.init() takes precedence.

  • device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).

  • verbose (bool, default False) – Whether to verbosely print out details on initialization.

  • force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.

load(prefix)

Load a model saved using the save API

Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.

This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.

Parameters:

prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params

load_dict(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from dict

Parameters:
  • param_dict (dict) – Dictionary containing model parameters

  • device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

load_parameters(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from file previously saved by save_parameters.

Parameters:
  • filename (str) – Path to parameter file.

  • device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

References

Saving and Loading Gluon Models

optimize_for(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.

Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.

Examples

# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)

# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)

Parameters:
  • x (NDArray) – first input to model

  • *args (NDArray) – other inputs to model

  • backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None

  • backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

  • clear (bool, default False) – clears any previous optimizations

  • partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists

  • static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.

  • static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.

  • inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.

  • forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.

  • backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.

  • **kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

property params

Return an attribute of instance, which is of type owner.

register_child(block, name=None)

Registers block as a child of self. Block s assigned to self as attributes will be registered automatically.

register_forward_hook(hook)

Registers a forward hook on the block.

The hook function is called immediately after forward(). It should not modify the input or output.

Parameters:

hook (callable) – The forward hook function of form hook(block, input, output) -> None.

Return type:

mxnet.gluon.utils.HookHandle

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

The hook function is called immediately before forward(). It should not modify the input or output.

Parameters:

hook (callable) – The forward hook function of form hook(block, input) -> None.

Return type:

mxnet.gluon.utils.HookHandle

register_op_hook(callback, monitor_all=False)

Install callback monitor.

Parameters:
  • callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).

  • monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.

reset()[source]

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated. Please refer to Block.reset_device.

reset_device(device)

Re-assign all Parameters to other devices.

Parameters:

device (Device or list of Device, default device.current_device().) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.

save(prefix)

Save the model architecture and parameters to load again later

Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.

Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.

Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.

Parameters:

prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params

save_parameters(filename, deduplicate=False)

Save parameters to file.

Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use HybridBlock.export().

Parameters:
  • filename (str) – Path to file.

  • deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.

References

Saving and Loading Gluon Models

setattr(name, value)

Set an attribute to a new value for all Parameters.

For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:

model.setattr('grad_req', 'null')

or change the learning rate multiplier:

model.setattr('lr_mult', 0.5)
Parameters:
  • name (str) – Name of the attribute.

  • value (valid type for attribute name) – The new value for the attribute.

share_parameters(shared)

Share parameters recursively inside the model.

For example, if you want dense1 to share dense0’s weights, you can do:

dense0 = nn.Dense(20)
dense1 = nn.Dense(20)
dense1.share_parameters(dense0.collect_params())
which equals to

dense1.weight = dense0.weight dense1.bias = dense0.bias

Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.

Parameters:

shared (Dict) – Dict of the shared parameters.

Return type:

this block

state_info(batch_size=0)

shape and layout information of states

summary(*inputs)

Print the summary of the model’s output and parameters.

The network must have been initialized, and must not have been hybridized.

Parameters:

inputs (object) – Any input that the model supports. For any tensor in the input, only mxnet.ndarray.NDArray is supported.

unroll(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)

Unrolls an RNN cell across time steps.

Parameters:
  • length (int) – Number of steps to unroll.

  • inputs (Symbol, list of Symbol, or None) –

    If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.

    If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).

  • begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.

  • layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.

  • merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.

  • valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.

Returns:

  • outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

zero_grad()

Sets all Parameters’ gradient buffer to 0.