mxnet.gluon.rnn.rnn_layer

Definition of various recurrent neural network layers.

Classes

GRU(hidden_size[, num_layers, layout, ...])

Applies a multi-layer gated recurrent unit (GRU) RNN to an input sequence.

LSTM(hidden_size[, num_layers, layout, ...])

Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence.

RNN(hidden_size[, num_layers, activation, ...])

Applies a multi-layer Elman RNN with tanh or ReLU non-linearity to an input sequence.

class mxnet.gluon.rnn.rnn_layer.GRU(hidden_size, num_layers=1, layout='TNC', dropout=0, bidirectional=False, input_size=0, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', dtype='float32', **kwargs)[source]

Bases: _RNNLayer

Applies a multi-layer gated recurrent unit (GRU) RNN to an input sequence. Note: this is an implementation of the cuDNN version of GRUs (slight modification compared to Cho et al. 2014; the reset gate \(r_t\) is applied after matrix multiplication).

For each element in the input sequence, each layer computes the following function:

\[\begin{split}\begin{array}{ll} r_t = sigmoid(W_{ir} x_t + b_{ir} + W_{hr} h_{(t-1)} + b_{hr}) \\ i_t = sigmoid(W_{ii} x_t + b_{ii} + W_{hi} h_{(t-1)} + b_{hi}) \\ n_t = \tanh(W_{in} x_t + b_{in} + r_t * (W_{hn} h_{(t-1)} + b_{hn})) \\ h_t = (1 - i_t) * n_t + i_t * h_{(t-1)} \\ \end{array}\end{split}\]

where \(h_t\) is the hidden state at time t, \(x_t\) is the hidden state of the previous layer at time t or \(input_t\) for the first layer, and \(r_t\), \(i_t\), \(n_t\) are the reset, input, and new gates, respectively.

Parameters:
  • hidden_size (int) – The number of features in the hidden state h

  • num_layers (int, default 1) – Number of recurrent layers.

  • layout (str, default 'TNC') – The format of input and output tensors. T, N and C stand for sequence length, batch size, and feature dimensions respectively.

  • dropout (float, default 0) – If non-zero, introduces a dropout layer on the outputs of each RNN layer except the last layer

  • bidirectional (bool, default False) – If True, becomes a bidirectional RNN.

  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.

  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.

  • i2h_bias_initializer (str or Initializer) – Initializer for the bias vector.

  • h2h_bias_initializer (str or Initializer) – Initializer for the bias vector.

  • dtype (str, default 'float32') – Type to initialize the parameters and default states to

  • input_size (int, default 0) – The number of expected features in the input x. If not specified, it will be inferred from input.

Inputs:
  • data: input tensor with shape (sequence_length, batch_size, input_size) when layout is “TNC”. For other layouts, dimensions are permuted accordingly using transpose() operator which adds performance overhead. Consider creating batches in TNC layout during data batching step.

  • states: initial recurrent state tensor with shape (num_layers, batch_size, num_hidden). If bidirectional is True, shape will instead be (2*num_layers, batch_size, num_hidden). If states is None, zeros will be used as default begin states.

Outputs:
  • out: output tensor with shape (sequence_length, batch_size, num_hidden) when layout is “TNC”. If bidirectional is True, output shape will instead be (sequence_length, batch_size, 2*num_hidden)

  • out_states: output recurrent state tensor with the same shape as states. If states is None out_states will not be returned.

Examples

>>> layer = mx.gluon.rnn.GRU(100, 3)
>>> layer.initialize()
>>> input = mx.np.random.uniform(size=(5, 3, 10))
>>> # by default zeros are used as begin state
>>> output = layer(input)
>>> # manually specify begin state.
>>> h0 = mx.np.random.uniform(size=(3, 3, 100))
>>> output, hn = layer(input, h0)
class mxnet.gluon.rnn.rnn_layer.LSTM(hidden_size, num_layers=1, layout='TNC', dropout=0, bidirectional=False, input_size=0, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', projection_size=None, h2r_weight_initializer=None, state_clip_min=None, state_clip_max=None, state_clip_nan=False, dtype='float32', **kwargs)[source]

Bases: _RNNLayer

Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence.

For each element in the input sequence, each layer computes the following function:

\[\begin{split}\begin{array}{ll} i_t = sigmoid(W_{ii} x_t + b_{ii} + W_{hi} h_{(t-1)} + b_{hi}) \\ f_t = sigmoid(W_{if} x_t + b_{if} + W_{hf} h_{(t-1)} + b_{hf}) \\ g_t = \tanh(W_{ig} x_t + b_{ig} + W_{hc} h_{(t-1)} + b_{hg}) \\ o_t = sigmoid(W_{io} x_t + b_{io} + W_{ho} h_{(t-1)} + b_{ho}) \\ c_t = f_t * c_{(t-1)} + i_t * g_t \\ h_t = o_t * \tanh(c_t) \end{array}\end{split}\]

where \(h_t\) is the hidden state at time t, \(c_t\) is the cell state at time t, \(x_t\) is the hidden state of the previous layer at time t or \(input_t\) for the first layer, and \(i_t\), \(f_t\), \(g_t\), \(o_t\) are the input, forget, cell, and out gates, respectively.

Parameters:
  • hidden_size (int) – The number of features in the hidden state h.

  • num_layers (int, default 1) – Number of recurrent layers.

  • layout (str, default 'TNC') – The format of input and output tensors. T, N and C stand for sequence length, batch size, and feature dimensions respectively.

  • dropout (float, default 0) – If non-zero, introduces a dropout layer on the outputs of each RNN layer except the last layer.

  • bidirectional (bool, default False) – If True, becomes a bidirectional RNN.

  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.

  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.

  • i2h_bias_initializer (str or Initializer, default 'lstmbias') – Initializer for the bias vector. By default, bias for the forget gate is initialized to 1 while all other biases are initialized to zero.

  • h2h_bias_initializer (str or Initializer) – Initializer for the bias vector.

  • projection_size (int, default None) – The number of features after projection.

  • h2r_weight_initializer (str or Initializer, default None) – Initializer for the projected recurrent weights matrix, used for the linear transformation of the recurrent state to the projected space.

  • state_clip_min (float or None, default None) – Minimum clip value of LSTM states. This option must be used together with state_clip_max. If None, clipping is not applied.

  • state_clip_max (float or None, default None) – Maximum clip value of LSTM states. This option must be used together with state_clip_min. If None, clipping is not applied.

  • state_clip_nan (boolean, default False) – Whether to stop NaN from propagating in state by clipping it to min/max. If the clipping range is not specified, this option is ignored.

  • dtype (str, default 'float32') – Type to initialize the parameters and default states to

  • input_size (int, default 0) – The number of expected features in the input x. If not specified, it will be inferred from input.

Inputs:
  • data: input tensor with shape (sequence_length, batch_size, input_size) when layout is “TNC”. For other layouts, dimensions are permuted accordingly using transpose() operator which adds performance overhead. Consider creating batches in TNC layout during data batching step.

  • states: a list of two initial recurrent state tensors. Each has shape (num_layers, batch_size, num_hidden). If bidirectional is True, shape will instead be (2*num_layers, batch_size, num_hidden). If states is None, zeros will be used as default begin states.

Outputs:
  • out: output tensor with shape (sequence_length, batch_size, num_hidden) when layout is “TNC”. If bidirectional is True, output shape will instead be (sequence_length, batch_size, 2*num_hidden)

  • out_states: a list of two output recurrent state tensors with the same shape as in states. If states is None out_states will not be returned.

Examples

>>> layer = mx.gluon.rnn.LSTM(100, 3)
>>> layer.initialize()
>>> input = mx.np.random.uniform(size=(5, 3, 10))
>>> # by default zeros are used as begin state
>>> output = layer(input)
>>> # manually specify begin state.
>>> h0 = mx.np.random.uniform(size=(3, 3, 100))
>>> c0 = mx.np.random.uniform(size=(3, 3, 100))
>>> output, hn = layer(input, [h0, c0])
class mxnet.gluon.rnn.rnn_layer.RNN(hidden_size, num_layers=1, activation='relu', layout='TNC', dropout=0, bidirectional=False, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', input_size=0, dtype='float32', **kwargs)[source]

Bases: _RNNLayer

Applies a multi-layer Elman RNN with tanh or ReLU non-linearity to an input sequence.

For each element in the input sequence, each layer computes the following function:

\[h_t = \tanh(w_{ih} * x_t + b_{ih} + w_{hh} * h_{(t-1)} + b_{hh})\]

where \(h_t\) is the hidden state at time t, and \(x_t\) is the output of the previous layer at time t or \(input_t\) for the first layer. If nonlinearity=’relu’, then ReLU is used instead of tanh.

Parameters:
  • hidden_size (int) – The number of features in the hidden state h.

  • num_layers (int, default 1) – Number of recurrent layers.

  • activation ({'relu' or 'tanh'}, default 'relu') – The activation function to use.

  • layout (str, default 'TNC') – The format of input and output tensors. T, N and C stand for sequence length, batch size, and feature dimensions respectively.

  • dropout (float, default 0) – If non-zero, introduces a dropout layer on the outputs of each RNN layer except the last layer.

  • bidirectional (bool, default False) – If True, becomes a bidirectional RNN.

  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.

  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.

  • i2h_bias_initializer (str or Initializer) – Initializer for the bias vector.

  • h2h_bias_initializer (str or Initializer) – Initializer for the bias vector.

  • input_size (int, default 0) – The number of expected features in the input x. If not specified, it will be inferred from input.

  • dtype (str, default 'float32') – Type to initialize the parameters and default states to

Inputs:
  • data: input tensor with shape (sequence_length, batch_size, input_size) when layout is “TNC”. For other layouts, dimensions are permuted accordingly using transpose() operator which adds performance overhead. Consider creating batches in TNC layout during data batching step.

  • states: initial recurrent state tensor with shape (num_layers, batch_size, num_hidden). If bidirectional is True, shape will instead be (2*num_layers, batch_size, num_hidden). If states is None, zeros will be used as default begin states.

Outputs:
  • out: output tensor with shape (sequence_length, batch_size, num_hidden) when layout is “TNC”. If bidirectional is True, output shape will instead be (sequence_length, batch_size, 2*num_hidden)

  • out_states: output recurrent state tensor with the same shape as states. If states is None out_states will not be returned.

Examples

>>> layer = mx.gluon.rnn.RNN(100, 3)
>>> layer.initialize()
>>> input = mx.np.random.uniform(size=(5, 3, 10))
>>> # by default zeros are used as begin state
>>> output = layer(input)
>>> # manually specify begin state.
>>> h0 = mx.np.random.uniform(size=(3, 3, 100))
>>> output, hn = layer(input, h0)