mxnet.gluon.nn.conv_layers¶

Convolutional neural network layers.

Classes

`AvgPool1D`([pool_size, strides, padding, ...])	Average pooling operation for temporal data.
`AvgPool2D`([pool_size, strides, padding, ...])	Average pooling operation for spatial data.
`AvgPool3D`([pool_size, strides, padding, ...])	Average pooling operation for 3D data (spatial or spatio-temporal).
`Conv1D`(channels, kernel_size[, strides, ...])	1D convolution layer (e.g. temporal convolution).
`Conv1DTranspose`(channels, kernel_size[, ...])	Transposed 1D convolution layer (sometimes called Deconvolution).
`Conv2D`(channels, kernel_size[, strides, ...])	2D convolution layer (e.g. spatial convolution over images).
`Conv2DTranspose`(channels, kernel_size[, ...])	Transposed 2D convolution layer (sometimes called Deconvolution).
`Conv3D`(channels, kernel_size[, strides, ...])	3D convolution layer (e.g. spatial convolution over volumes).
`Conv3DTranspose`(channels, kernel_size[, ...])	Transposed 3D convolution layer (sometimes called Deconvolution).
`DeformableConvolution`(channels[, ...])	2-D Deformable Convolution v_1 (Dai, 2017).
`GlobalAvgPool1D`([layout])	Global average pooling operation for temporal data.
`GlobalAvgPool2D`([layout])	Global average pooling operation for spatial data.
`GlobalAvgPool3D`([layout])	Global average pooling operation for 3D data (spatial or spatio-temporal).
`GlobalMaxPool1D`([layout])	Gloabl max pooling operation for one dimensional (temporal) data.
`GlobalMaxPool2D`([layout])	Global max pooling operation for two dimensional (spatial) data.
`GlobalMaxPool3D`([layout])	Global max pooling operation for 3D data (spatial or spatio-temporal).
`MaxPool1D`([pool_size, strides, padding, ...])	Max pooling operation for one dimensional data.
`MaxPool2D`([pool_size, strides, padding, ...])	Max pooling operation for two dimensional (spatial) data.
`MaxPool3D`([pool_size, strides, padding, ...])	Max pooling operation for 3D data (spatial or spatio-temporal).
`ModulatedDeformableConvolution`(channels[, ...])	2-D Deformable Convolution v2 (Dai, 2018).
`PixelShuffle1D`(factor)	Pixel-shuffle layer for upsampling in 1 dimension.
`PixelShuffle2D`(factor)	Pixel-shuffle layer for upsampling in 2 dimensions.
`PixelShuffle3D`(factor)	Pixel-shuffle layer for upsampling in 3 dimensions.
`ReflectionPad2D`([padding])	Pads the input tensor using the reflection of the input boundary.

class mxnet.gluon.nn.conv_layers.AvgPool1D(pool_size=2, strides=None, padding=0, layout='NCW', ceil_mode=False, count_include_pad=True, **kwargs)[source]¶

Bases: _Pooling

Average pooling operation for temporal data.

Parameters:

pool_size (int) – Size of the average pooling windows.
strides (int, or None) – Factor by which to downscale. E.g. 2 will halve the input size. If None, it will default to pool_size.
padding (int) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points.
layout (str, default 'NCW') – Dimension ordering of data and out (‘NCW’ or ‘NWC’). ‘N’, ‘C’, ‘W’ stands for batch, channel, and width (time) dimensions respectively. padding is applied on ‘W’ dimension.
ceil_mode (bool, default False) – When True, will use ceil instead of floor to compute the output shape.
count_include_pad (bool, default True) – When ‘False’, will exclude padding elements when computing the average value.

Inputs:

data: 3D input tensor with shape (batch_size, in_channels, width) when layout is NCW. For other layouts shape is permuted accordingly.

Outputs:

out: 3D output tensor with shape (batch_size, channels, out_width) when layout is NCW. out_width is calculated as:
```
out_width = floor((width+2*padding-pool_size)/strides)+1
```
When ceil_mode is True, ceil will be used instead of floor in this equation.

class mxnet.gluon.nn.conv_layers.AvgPool2D(pool_size=(2, 2), strides=None, padding=0, ceil_mode=False, layout='NCHW', count_include_pad=True, **kwargs)[source]¶

Bases: _Pooling

Average pooling operation for spatial data.

Parameters:

pool_size (int or list/tuple of 2 ints,) – Size of the average pooling windows.
strides (int, list/tuple of 2 ints, or None.) – Factor by which to downscale. E.g. 2 will halve the input size. If None, it will default to pool_size.
padding (int or list/tuple of 2 ints,) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points.
layout (str, default 'NCHW') – Dimension ordering of data and out (‘NCHW’ or ‘NHWC’). ‘N’, ‘C’, ‘H’, ‘W’ stands for batch, channel, height, and width dimensions respectively. padding is applied on ‘H’ and ‘W’ dimension.
ceil_mode (bool, default False) – When True, will use ceil instead of floor to compute the output shape.
count_include_pad (bool, default True) – When ‘False’, will exclude padding elements when computing the average value.

Inputs:

data: 4D input tensor with shape (batch_size, in_channels, height, width) when layout is NCHW. For other layouts shape is permuted accordingly.

Outputs:

out: 4D output tensor with shape (batch_size, channels, out_height, out_width) when layout is NCHW. out_height and out_width are calculated as:
```
out_height = floor((height+2*padding[0]-pool_size[0])/strides[0])+1
out_width = floor((width+2*padding[1]-pool_size[1])/strides[1])+1
```
When ceil_mode is True, ceil will be used instead of floor in this equation.

class mxnet.gluon.nn.conv_layers.AvgPool3D(pool_size=(2, 2, 2), strides=None, padding=0, ceil_mode=False, layout='NCDHW', count_include_pad=True, **kwargs)[source]¶

Bases: _Pooling

Average pooling operation for 3D data (spatial or spatio-temporal).

Parameters:

pool_size (int or list/tuple of 3 ints,) – Size of the average pooling windows.
strides (int, list/tuple of 3 ints, or None.) – Factor by which to downscale. E.g. 2 will halve the input size. If None, it will default to pool_size.
padding (int or list/tuple of 3 ints,) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points.
layout (str, default 'NCDHW') – Dimension ordering of data and out (‘NCDHW’ or ‘NDHWC’). ‘N’, ‘C’, ‘H’, ‘W’, ‘D’ stands for batch, channel, height, width and depth dimensions respectively. padding is applied on ‘D’, ‘H’ and ‘W’ dimension.
ceil_mode (bool, default False) – When True, will use ceil instead of floor to compute the output shape.
count_include_pad (bool, default True) – When ‘False’, will exclude padding elements when computing the average value.

Inputs:

data: 5D input tensor with shape (batch_size, in_channels, depth, height, width) when layout is NCDHW. For other layouts shape is permuted accordingly.

Outputs:

out: 5D output tensor with shape (batch_size, channels, out_depth, out_height, out_width) when layout is NCDHW. out_depth, out_height and out_width are calculated as:
```
out_depth = floor((depth+2*padding[0]-pool_size[0])/strides[0])+1
out_height = floor((height+2*padding[1]-pool_size[1])/strides[1])+1
out_width = floor((width+2*padding[2]-pool_size[2])/strides[2])+1
```
When ceil_mode is True, ceil will be used instead of floor in this equation.

class mxnet.gluon.nn.conv_layers.Conv1D(channels, kernel_size, strides=1, padding=0, dilation=1, groups=1, layout='NCW', activation=None, use_bias=True, weight_initializer=None, bias_initializer='zeros', in_channels=0, **kwargs)[source]¶

Bases: _Conv

1D convolution layer (e.g. temporal convolution).

This layer creates a convolution kernel that is convolved with the layer input over a single spatial (or temporal) dimension to produce a tensor of outputs. If use_bias is True, a bias vector is created and added to the outputs. Finally, if activation is not None, it is applied to the outputs as well.

If in_channels is not specified, Parameter initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.

Parameters:

channels (int) – The dimensionality of the output space, i.e. the number of output channels (filters) in the convolution.
kernel_size (int or tuple/list of 1 int) – Specifies the dimensions of the convolution window.
strides (int or tuple/list of 1 int,) – Specify the strides of the convolution.
padding (int or a tuple/list of 1 int,) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points
dilation (int or tuple/list of 1 int) – Specifies the dilation rate to use for dilated convolution.
groups (int) – Controls the connections between inputs and outputs. At groups=1, all inputs are convolved to all outputs. At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.
layout (str, default 'NCW') – Dimension ordering of data and weight. Only supports ‘NCW’ layout for now. ‘N’, ‘C’, ‘W’ stands for batch, channel, and width (time) dimensions respectively. Convolution is applied on the ‘W’ dimension.
in_channels (int, default 0) – The number of input channels to this layer. If not specified, initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.
activation (str) – Activation function to use. See activation(). If you don’t specify anything, no activation is applied (ie. “linear” activation: a(x) = x).
use_bias (bool) – Whether the layer uses a bias vector.
weight_initializer (str or Initializer) – Initializer for the weight weights matrix.
bias_initializer (str or Initializer) – Initializer for the bias vector.

Inputs:

data: 3D input tensor with shape (batch_size, in_channels, width) when layout is NCW. For other layouts shape is permuted accordingly.

Outputs:

out: 3D output tensor with shape (batch_size, channels, out_width) when layout is NCW. out_width is calculated as:
```
out_width = floor((width+2*padding-dilation*(kernel_size-1)-1)/stride)+1
```

class mxnet.gluon.nn.conv_layers.Conv1DTranspose(channels, kernel_size, strides=1, padding=0, output_padding=0, dilation=1, groups=1, layout='NCW', activation=None, use_bias=True, weight_initializer=None, bias_initializer='zeros', in_channels=0, **kwargs)[source]¶

Bases: _Conv

Transposed 1D convolution layer (sometimes called Deconvolution).

The need for transposed convolutions generally arises from the desire to use a transformation going in the opposite direction of a normal convolution, i.e., from something that has the shape of the output of some convolution to something that has the shape of its input while maintaining a connectivity pattern that is compatible with said convolution.

If in_channels is not specified, Parameter initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.

Parameters:

channels (int) – The dimensionality of the output space, i.e. the number of output channels (filters) in the convolution.
kernel_size (int or tuple/list of 1 int) – Specifies the dimensions of the convolution window.
strides (int or tuple/list of 1 int) – Specify the strides of the convolution.
padding (int or a tuple/list of 1 int,) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points
output_padding (int or a tuple/list of 1 int) – Controls the amount of implicit zero-paddings on both sides of the output for output_padding number of points for each dimension.
dilation (int or tuple/list of 1 int) – Controls the spacing between the kernel points; also known as the a trous algorithm
groups (int) – Controls the connections between inputs and outputs. At groups=1, all inputs are convolved to all outputs. At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.
layout (str, default 'NCW') – Dimension ordering of data and weight. Only supports ‘NCW’ layout for now. ‘N’, ‘C’, ‘W’ stands for batch, channel, and width (time) dimensions respectively. Convolution is applied on the ‘W’ dimension.
in_channels (int, default 0) – The number of input channels to this layer. If not specified, initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.
activation (str) – Activation function to use. See activation(). If you don’t specify anything, no activation is applied (ie. “linear” activation: a(x) = x).
use_bias (bool) – Whether the layer uses a bias vector.
weight_initializer (str or Initializer) – Initializer for the weight weights matrix.
bias_initializer (str or Initializer) – Initializer for the bias vector.

Inputs:

data: 3D input tensor with shape (batch_size, in_channels, width) when layout is NCW. For other layouts shape is permuted accordingly.

Outputs:

out: 3D output tensor with shape (batch_size, channels, out_width) when layout is NCW. out_width is calculated as:
```
out_width = (width-1)*strides-2*padding+kernel_size+output_padding
```

class mxnet.gluon.nn.conv_layers.Conv2D(channels, kernel_size, strides=(1, 1), padding=(0, 0), dilation=(1, 1), groups=1, layout='NCHW', activation=None, use_bias=True, weight_initializer=None, bias_initializer='zeros', in_channels=0, **kwargs)[source]¶

Bases: _Conv

2D convolution layer (e.g. spatial convolution over images).

This layer creates a convolution kernel that is convolved with the layer input to produce a tensor of outputs. If use_bias is True, a bias vector is created and added to the outputs. Finally, if activation is not None, it is applied to the outputs as well.

If in_channels is not specified, Parameter initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.

Parameters:

channels (int) – The dimensionality of the output space, i.e. the number of output channels (filters) in the convolution.
kernel_size (int or tuple/list of 2 int) – Specifies the dimensions of the convolution window.
strides (int or tuple/list of 2 int,) – Specify the strides of the convolution.
padding (int or a tuple/list of 2 int,) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points
dilation (int or tuple/list of 2 int) – Specifies the dilation rate to use for dilated convolution.
groups (int) – Controls the connections between inputs and outputs. At groups=1, all inputs are convolved to all outputs. At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.
layout (str, default 'NCHW') – Dimension ordering of data and weight. Only supports ‘NCHW’ and ‘NHWC’ layout for now. ‘N’, ‘C’, ‘H’, ‘W’ stands for batch, channel, height, and width dimensions respectively. Convolution is applied on the ‘H’ and ‘W’ dimensions.
in_channels (int, default 0) – The number of input channels to this layer. If not specified, initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.
activation (str) – Activation function to use. See activation(). If you don’t specify anything, no activation is applied (ie. “linear” activation: a(x) = x).
use_bias (bool) – Whether the layer uses a bias vector.
weight_initializer (str or Initializer) – Initializer for the weight weights matrix.
bias_initializer (str or Initializer) – Initializer for the bias vector.

Inputs:

data: 4D input tensor with shape (batch_size, in_channels, height, width) when layout is NCHW. For other layouts shape is permuted accordingly.

Outputs:

out: 4D output tensor with shape (batch_size, channels, out_height, out_width) when layout is NCHW. out_height and out_width are calculated as:

out_height = floor((height+2*padding[0]-dilation[0]*(kernel_size[0]-1)-1)/stride[0])+1
out_width = floor((width+2*padding[1]-dilation[1]*(kernel_size[1]-1)-1)/stride[1])+1

class mxnet.gluon.nn.conv_layers.Conv2DTranspose(channels, kernel_size, strides=(1, 1), padding=(0, 0), output_padding=(0, 0), dilation=(1, 1), groups=1, layout='NCHW', activation=None, use_bias=True, weight_initializer=None, bias_initializer='zeros', in_channels=0, **kwargs)[source]¶

Bases: _Conv

Transposed 2D convolution layer (sometimes called Deconvolution).

If in_channels is not specified, Parameter initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.

Parameters:

channels (int) – The dimensionality of the output space, i.e. the number of output channels (filters) in the convolution.
kernel_size (int or tuple/list of 2 int) – Specifies the dimensions of the convolution window.
strides (int or tuple/list of 2 int) – Specify the strides of the convolution.
padding (int or a tuple/list of 2 int,) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points
output_padding (int or a tuple/list of 2 int) – Controls the amount of implicit zero-paddings on both sides of the output for output_padding number of points for each dimension.
dilation (int or tuple/list of 2 int) – Controls the spacing between the kernel points; also known as the a trous algorithm
groups (int) – Controls the connections between inputs and outputs. At groups=1, all inputs are convolved to all outputs. At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.
layout (str, default 'NCHW') – Dimension ordering of data and weight. Only supports ‘NCHW’ and ‘NHWC’ layout for now. ‘N’, ‘C’, ‘H’, ‘W’ stands for batch, channel, height, and width dimensions respectively. Convolution is applied on the ‘H’ and ‘W’ dimensions.
in_channels (int, default 0) – The number of input channels to this layer. If not specified, initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.
activation (str) – Activation function to use. See activation(). If you don’t specify anything, no activation is applied (ie. “linear” activation: a(x) = x).
use_bias (bool) – Whether the layer uses a bias vector.
weight_initializer (str or Initializer) – Initializer for the weight weights matrix.
bias_initializer (str or Initializer) – Initializer for the bias vector.

Inputs:

data: 4D input tensor with shape (batch_size, in_channels, height, width) when layout is NCHW. For other layouts shape is permuted accordingly.

Outputs:

out: 4D output tensor with shape (batch_size, channels, out_height, out_width) when layout is NCHW. out_height and out_width are calculated as:

out_height = (height-1)*strides[0]-2*padding[0]+kernel_size[0]+output_padding[0]
out_width = (width-1)*strides[1]-2*padding[1]+kernel_size[1]+output_padding[1]

class mxnet.gluon.nn.conv_layers.Conv3D(channels, kernel_size, strides=(1, 1, 1), padding=(0, 0, 0), dilation=(1, 1, 1), groups=1, layout='NCDHW', activation=None, use_bias=True, weight_initializer=None, bias_initializer='zeros', in_channels=0, **kwargs)[source]¶

Bases: _Conv

3D convolution layer (e.g. spatial convolution over volumes).

If in_channels is not specified, Parameter initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.

Parameters:

channels (int) – The dimensionality of the output space, i.e. the number of output channels (filters) in the convolution.
kernel_size (int or tuple/list of 3 int) – Specifies the dimensions of the convolution window.
strides (int or tuple/list of 3 int,) – Specify the strides of the convolution.
padding (int or a tuple/list of 3 int,) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points
dilation (int or tuple/list of 3 int) – Specifies the dilation rate to use for dilated convolution.
groups (int) – Controls the connections between inputs and outputs. At groups=1, all inputs are convolved to all outputs. At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.
layout (str, default 'NCDHW') – Dimension ordering of data and weight. Only supports ‘NCDHW’ and ‘NDHWC’ layout for now. ‘N’, ‘C’, ‘H’, ‘W’, ‘D’ stands for batch, channel, height, width and depth dimensions respectively. Convolution is applied on the ‘D’, ‘H’ and ‘W’ dimensions.
in_channels (int, default 0) – The number of input channels to this layer. If not specified, initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.
activation (str) – Activation function to use. See activation(). If you don’t specify anything, no activation is applied (ie. “linear” activation: a(x) = x).
use_bias (bool) – Whether the layer uses a bias vector.
weight_initializer (str or Initializer) – Initializer for the weight weights matrix.
bias_initializer (str or Initializer) – Initializer for the bias vector.

Inputs:

data: 5D input tensor with shape (batch_size, in_channels, depth, height, width) when layout is NCDHW. For other layouts shape is permuted accordingly.

Outputs:

out: 5D output tensor with shape (batch_size, channels, out_depth, out_height, out_width) when layout is NCDHW. out_depth, out_height and out_width are calculated as:

out_depth = floor((depth+2*padding[0]-dilation[0]*(kernel_size[0]-1)-1)/stride[0])+1
out_height = floor((height+2*padding[1]-dilation[1]*(kernel_size[1]-1)-1)/stride[1])+1
out_width = floor((width+2*padding[2]-dilation[2]*(kernel_size[2]-1)-1)/stride[2])+1

class mxnet.gluon.nn.conv_layers.Conv3DTranspose(channels, kernel_size, strides=(1, 1, 1), padding=(0, 0, 0), output_padding=(0, 0, 0), dilation=(1, 1, 1), groups=1, layout='NCDHW', activation=None, use_bias=True, weight_initializer=None, bias_initializer='zeros', in_channels=0, **kwargs)[source]¶

Bases: _Conv

Transposed 3D convolution layer (sometimes called Deconvolution).

If in_channels is not specified, Parameter initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.

Parameters:

channels (int) – The dimensionality of the output space, i.e. the number of output channels (filters) in the convolution.
kernel_size (int or tuple/list of 3 int) – Specifies the dimensions of the convolution window.
strides (int or tuple/list of 3 int) – Specify the strides of the convolution.
padding (int or a tuple/list of 3 int,) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points
output_padding (int or a tuple/list of 3 int) – Controls the amount of implicit zero-paddings on both sides of the output for output_padding number of points for each dimension.
dilation (int or tuple/list of 3 int) – Controls the spacing between the kernel points; also known as the a trous algorithm.
groups (int) – Controls the connections between inputs and outputs. At groups=1, all inputs are convolved to all outputs. At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.
layout (str, default 'NCDHW') – Dimension ordering of data and weight. Only supports ‘NCDHW’ and ‘NDHWC’ layout for now. ‘N’, ‘C’, ‘H’, ‘W’, ‘D’ stands for batch, channel, height, width and depth dimensions respectively. Convolution is applied on the ‘D’, ‘H’ and ‘W’ dimensions.
in_channels (int, default 0) – The number of input channels to this layer. If not specified, initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.
activation (str) – Activation function to use. See activation(). If you don’t specify anything, no activation is applied (ie. “linear” activation: a(x) = x).
use_bias (bool) – Whether the layer uses a bias vector.
weight_initializer (str or Initializer) – Initializer for the weight weights matrix.
bias_initializer (str or Initializer) – Initializer for the bias vector.

Inputs:

data: 5D input tensor with shape (batch_size, in_channels, depth, height, width) when layout is NCDHW. For other layouts shape is permuted accordingly.

Outputs:

out: 5D output tensor with shape (batch_size, channels, out_depth, out_height, out_width) when layout is NCDHW. out_depth, out_height and out_width are calculated as:

out_depth = (depth-1)*strides[0]-2*padding[0]+kernel_size[0]+output_padding[0]
out_height = (height-1)*strides[1]-2*padding[1]+kernel_size[1]+output_padding[1]
out_width = (width-1)*strides[2]-2*padding[2]+kernel_size[2]+output_padding[2]

class mxnet.gluon.nn.conv_layers.DeformableConvolution(channels, kernel_size=(1, 1), strides=(1, 1), padding=(0, 0), dilation=(1, 1), groups=1, num_deformable_group=1, layout='NCHW', use_bias=True, in_channels=0, activation=None, weight_initializer=None, bias_initializer='zeros', offset_weight_initializer='zeros', offset_bias_initializer='zeros', offset_use_bias=True, op_name='DeformableConvolution', adj=None)[source]¶

Bases: HybridBlock

2-D Deformable Convolution v_1 (Dai, 2017). Normal Convolution uses sampling points in a regular grid, while the sampling points of Deformablem Convolution can be offset. The offset is learned with a separate convolution layer during the training. Both the convolution layer for generating the output features and the offsets are included in this gluon layer.

Parameters:

channels (int,) – The dimensionality of the output space i.e. the number of output channels in the convolution.
kernel_size (int or tuple/list of 2 ints, (Default value = (1,1))) – Specifies the dimensions of the convolution window.
strides (int or tuple/list of 2 ints, (Default value = (1,1))) – Specifies the strides of the convolution.
padding (int or tuple/list of 2 ints, (Default value = (0,0))) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points.
dilation (int or tuple/list of 2 ints, (Default value = (1,1))) – Specifies the dilation rate to use for dilated convolution.
groups (int, (Default value = 1)) – Controls the connections between inputs and outputs. At groups=1, all inputs are convolved to all outputs. At groups=2, the operation becomes equivalent to having two convolution layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.
num_deformable_group (int, (Default value = 1)) – Number of deformable group partitions.
layout (str, (Default value = NCHW)) – Dimension ordering of data and weight. Can be ‘NCW’, ‘NWC’, ‘NCHW’, ‘NHWC’, ‘NCDHW’, ‘NDHWC’, etc. ‘N’, ‘C’, ‘H’, ‘W’, ‘D’ stands for batch, channel, height, width and depth dimensions respectively. Convolution is performed over ‘D’, ‘H’, and ‘W’ dimensions.
use_bias (bool, (Default value = True)) – Whether the layer for generating the output features uses a bias vector.
in_channels (int, (Default value = 0)) – The number of input channels to this layer. If not specified, initialization will be deferred to the first time forward is called and input channels will be inferred from the shape of input data.
activation (str, (Default value = None)) – Activation function to use. See activation(). If you don’t specify anything, no activation is applied (ie. “linear” activation: a(x) = x).
weight_initializer (str or Initializer, (Default value = None)) – Initializer for the weight weights matrix for the convolution layer for generating the output features.
bias_initializer (str or Initializer, (Default value = zeros)) – Initializer for the bias vector for the convolution layer for generating the output features.
offset_weight_initializer (str or Initializer, (Default value = zeros)) – Initializer for the weight weights matrix for the convolution layer for generating the offset.
offset_bias_initializer (str or Initializer, (Default value = zeros),) – Initializer for the bias vector for the convolution layer for generating the offset.
offset_use_bias (bool, (Default value = True)) – Whether the layer for generating the offset uses a bias vector.
Inputs –
- data: 4D input tensor with shape (batch_size, in_channels, height, width) when layout is NCHW. For other layouts shape is permuted accordingly.

Outputs –

out: 4D output tensor with shape (batch_size, channels, out_height, out_width) when layout is NCHW. out_height and out_width are calculated as:

out_height = floor((height+2*padding[0]-dilation[0]*(kernel_size[0]-1)-1)/stride[0])+1
out_width = floor((width+2*padding[1]-dilation[1]*(kernel_size[1]-1)-1)/stride[1])+1

apply(fn)¶

Applies fn recursively to every child block as well as self.

Parameters:: fn (callable) – Function to be applied to each submodule, of form fn(block).
Return type:: this block

cast(dtype)¶

Cast this Block to use another data type.

Parameters:: dtype (str or numpy.dtype) – The new data type.

collect_params(select=None)¶

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:

model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')

or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:

model.collect_params('.*weight|.*bias')

Parameters:: select (str) – regular expressions
Return type:: The selected Dict

export(path, epoch=0, remove_amp_cast=True)¶

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

Note

When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.

Parameters:

path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.

Returns:

symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.

forward(x)[source]¶: Overrides the forward computation. Arguments must be mxnet.numpy.ndarray.

hybridize(active=True, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None)¶

Activates or deactivates HybridBlock s recursively. Has no effect on non-hybrid children.

Parameters:

active (bool, default True) – Whether to turn hybrid on or off.
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.

infer_shape(x)[source]¶: Infers shape of Parameters from inputs.

infer_type(*args)¶: Infers data type of Parameters from inputs.

initialize(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶

Initializes Parameter s of this Block and its children.

Parameters:

init (Initializer) – Global default Initializer to be used when Parameter.init() is None. Otherwise, Parameter.init() takes precedence.
device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.

load(prefix)¶

Load a model saved using the save API

Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.

This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.

Parameters:: prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params

load_dict(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶

Load parameters from dict

Parameters:

param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

load_parameters(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶

Load parameters from file previously saved by save_parameters.

Parameters:

filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

References

Saving and Loading Gluon Models

optimize_for(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.

Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.

Examples

# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)

# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)

Parameters:

x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

property params¶: Returns this Block’s parameter dictionary (does not include its children’s parameters).

pre_infer_offset_weight()[source]¶: Pre-infer the shape of offsite weight parameter based on kernel size, group size and offset channels

pre_infer_weight()[source]¶: Pre-infer the shape of weight parameter based on kernel size, group size and channels

register_child(block, name=None)¶: Registers block as a child of self. Block s assigned to self as attributes will be registered automatically.

register_forward_hook(hook)¶

Registers a forward hook on the block.

The hook function is called immediately after forward(). It should not modify the input or output.

Parameters:: hook (callable) – The forward hook function of form hook(block, input, output) -> None.
Return type:: mxnet.gluon.utils.HookHandle

register_forward_pre_hook(hook)¶

Registers a forward pre-hook on the block.

The hook function is called immediately before forward(). It should not modify the input or output.

Parameters:: hook (callable) – The forward hook function of form hook(block, input) -> None.
Return type:: mxnet.gluon.utils.HookHandle

register_op_hook(callback, monitor_all=False)¶

Install op hook for block recursively.

Parameters:

callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.

reset_ctx(ctx)¶: This function has been deprecated. Please refer to HybridBlock.reset_device.

reset_device(device)¶

Re-assign all Parameters to other devices. If the Block is hybridized, it will reset the _cached_op_args.

Parameters:: device (Device or list of Device, default device.current_device().) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.

save(prefix)¶

Save the model architecture and parameters to load again later

Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.

Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.

Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.

Parameters:: prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params

save_parameters(filename, deduplicate=False)¶

Save parameters to file.

Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use HybridBlock.export().

Parameters:

filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.

References

Saving and Loading Gluon Models

setattr(name, value)¶

Set an attribute to a new value for all Parameters.

For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:

model.setattr('grad_req', 'null')

or change the learning rate multiplier:

model.setattr('lr_mult', 0.5)

Parameters:

name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.

share_parameters(shared)¶

Share parameters recursively inside the model.

For example, if you want dense1 to share dense0’s weights, you can do:

dense0 = nn.Dense(20)
dense1 = nn.Dense(20)
dense1.share_parameters(dense0.collect_params())

which equals to: dense1.weight = dense0.weight dense1.bias = dense0.bias

Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.

Parameters:: shared (Dict) – Dict of the shared parameters.
Return type:: this block

summary(*inputs)¶

Print the summary of the model’s output and parameters.

The network must have been initialized, and must not have been hybridized.

Parameters:: inputs (object) – Any input that the model supports. For any tensor in the input, only mxnet.ndarray.NDArray is supported.

zero_grad()¶: Sets all Parameters’ gradient buffer to 0.

class mxnet.gluon.nn.conv_layers.GlobalAvgPool1D(layout='NCW', **kwargs)[source]¶

Bases: _Pooling

Global average pooling operation for temporal data.

Parameters:: layout (str, default 'NCW') – Dimension ordering of data and out (‘NCW’ or ‘NWC’). ‘N’, ‘C’, ‘W’ stands for batch, channel, and width (time) dimensions respectively. padding is applied on ‘W’ dimension.

Inputs:

data: 3D input tensor with shape (batch_size, in_channels, width) when layout is NCW. For other layouts shape is permuted accordingly.

Outputs:

out: 3D output tensor with shape (batch_size, channels, 1).

class mxnet.gluon.nn.conv_layers.GlobalAvgPool2D(layout='NCHW', **kwargs)[source]¶

Bases: _Pooling

Global average pooling operation for spatial data.

Parameters:: layout (str, default 'NCHW') – Dimension ordering of data and out (‘NCHW’ or ‘NHWC’). ‘N’, ‘C’, ‘H’, ‘W’ stands for batch, channel, height, and width dimensions respectively.

Inputs:

data: 4D input tensor with shape (batch_size, in_channels, height, width) when layout is NCHW. For other layouts shape is permuted accordingly.

Outputs:

out: 4D output tensor with shape (batch_size, channels, 1, 1) when layout is NCHW.

class mxnet.gluon.nn.conv_layers.GlobalAvgPool3D(layout='NCDHW', **kwargs)[source]¶

Bases: _Pooling

Global average pooling operation for 3D data (spatial or spatio-temporal).

Parameters:: layout (str, default 'NCDHW') – Dimension ordering of data and out (‘NCDHW’ or ‘NDHWC’). ‘N’, ‘C’, ‘H’, ‘W’, ‘D’ stands for batch, channel, height, width and depth dimensions respectively. padding is applied on ‘D’, ‘H’ and ‘W’ dimension.

Inputs:

data: 5D input tensor with shape (batch_size, in_channels, depth, height, width) when layout is NCDHW. For other layouts shape is permuted accordingly.

Outputs:

out: 5D output tensor with shape (batch_size, channels, 1, 1, 1) when layout is NCDHW.

class mxnet.gluon.nn.conv_layers.GlobalMaxPool1D(layout='NCW', **kwargs)[source]¶

Bases: _Pooling

Gloabl max pooling operation for one dimensional (temporal) data.

Parameters:: layout (str, default 'NCW') – Dimension ordering of data and out (‘NCW’ or ‘NWC’). ‘N’, ‘C’, ‘W’ stands for batch, channel, and width (time) dimensions respectively. Pooling is applied on the W dimension.

Inputs:

data: 3D input tensor with shape (batch_size, in_channels, width) when layout is NCW. For other layouts shape is permuted accordingly.

Outputs:

out: 3D output tensor with shape (batch_size, channels, 1) when layout is NCW.

class mxnet.gluon.nn.conv_layers.GlobalMaxPool2D(layout='NCHW', **kwargs)[source]¶

Bases: _Pooling

Global max pooling operation for two dimensional (spatial) data.

Parameters:: layout (str, default 'NCHW') – Dimension ordering of data and out (‘NCHW’ or ‘NHWC’). ‘N’, ‘C’, ‘H’, ‘W’ stands for batch, channel, height, and width dimensions respectively. padding is applied on ‘H’ and ‘W’ dimension.

Inputs:

data: 4D input tensor with shape (batch_size, in_channels, height, width) when layout is NCHW. For other layouts shape is permuted accordingly.

Outputs:

out: 4D output tensor with shape (batch_size, channels, 1, 1) when layout is NCHW.

class mxnet.gluon.nn.conv_layers.GlobalMaxPool3D(layout='NCDHW', **kwargs)[source]¶

Bases: _Pooling

Global max pooling operation for 3D data (spatial or spatio-temporal).

Parameters:: layout (str, default 'NCDHW') – Dimension ordering of data and out (‘NCDHW’ or ‘NDHWC’). ‘N’, ‘C’, ‘H’, ‘W’, ‘D’ stands for batch, channel, height, width and depth dimensions respectively. padding is applied on ‘D’, ‘H’ and ‘W’ dimension.

Inputs:

data: 5D input tensor with shape (batch_size, in_channels, depth, height, width) when layout is NCW. For other layouts shape is permuted accordingly.

Outputs:

out: 5D output tensor with shape (batch_size, channels, 1, 1, 1) when layout is NCDHW.

class mxnet.gluon.nn.conv_layers.MaxPool1D(pool_size=2, strides=None, padding=0, layout='NCW', ceil_mode=False, **kwargs)[source]¶

Bases: _Pooling

Max pooling operation for one dimensional data.

Parameters:

pool_size (int) – Size of the max pooling windows.
strides (int, or None) – Factor by which to downscale. E.g. 2 will halve the input size. If None, it will default to pool_size.
padding (int) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points.
layout (str, default 'NCW') – Dimension ordering of data and out (‘NCW’ or ‘NWC’). ‘N’, ‘C’, ‘W’ stands for batch, channel, and width (time) dimensions respectively. Pooling is applied on the W dimension.
ceil_mode (bool, default False) – When True, will use ceil instead of floor to compute the output shape.

Inputs:

data: 3D input tensor with shape (batch_size, in_channels, width) when layout is NCW. For other layouts shape is permuted accordingly.

Outputs:

out: 3D output tensor with shape (batch_size, channels, out_width) when layout is NCW. out_width is calculated as:
```
out_width = floor((width+2*padding-pool_size)/strides)+1
```
When ceil_mode is True, ceil will be used instead of floor in this equation.

class mxnet.gluon.nn.conv_layers.MaxPool2D(pool_size=(2, 2), strides=None, padding=0, layout='NCHW', ceil_mode=False, **kwargs)[source]¶

Bases: _Pooling

Max pooling operation for two dimensional (spatial) data.

Parameters:

pool_size (int or list/tuple of 2 ints,) – Size of the max pooling windows.
strides (int, list/tuple of 2 ints, or None.) – Factor by which to downscale. E.g. 2 will halve the input size. If None, it will default to pool_size.
padding (int or list/tuple of 2 ints,) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points.
layout (str, default 'NCHW') – Dimension ordering of data and out (‘NCHW’ or ‘NHWC’). ‘N’, ‘C’, ‘H’, ‘W’ stands for batch, channel, height, and width dimensions respectively. padding is applied on ‘H’ and ‘W’ dimension.
ceil_mode (bool, default False) – When True, will use ceil instead of floor to compute the output shape.

Inputs:

data: 4D input tensor with shape (batch_size, in_channels, height, width) when layout is NCHW. For other layouts shape is permuted accordingly.

Outputs:

out: 4D output tensor with shape (batch_size, channels, out_height, out_width) when layout is NCHW. out_height and out_width are calculated as:
```
out_height = floor((height+2*padding[0]-pool_size[0])/strides[0])+1
out_width = floor((width+2*padding[1]-pool_size[1])/strides[1])+1
```
When ceil_mode is True, ceil will be used instead of floor in this equation.

class mxnet.gluon.nn.conv_layers.MaxPool3D(pool_size=(2, 2, 2), strides=None, padding=0, ceil_mode=False, layout='NCDHW', **kwargs)[source]¶

Bases: _Pooling

Max pooling operation for 3D data (spatial or spatio-temporal).

Parameters:

pool_size (int or list/tuple of 3 ints,) – Size of the max pooling windows.
strides (int, list/tuple of 3 ints, or None.) – Factor by which to downscale. E.g. 2 will halve the input size. If None, it will default to pool_size.
padding (int or list/tuple of 3 ints,) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points.
layout (str, default 'NCDHW') – Dimension ordering of data and out (‘NCDHW’ or ‘NDHWC’). ‘N’, ‘C’, ‘H’, ‘W’, ‘D’ stands for batch, channel, height, width and depth dimensions respectively. padding is applied on ‘D’, ‘H’ and ‘W’ dimension.
ceil_mode (bool, default False) – When True, will use ceil instead of floor to compute the output shape.

Inputs:

data: 5D input tensor with shape (batch_size, in_channels, depth, height, width) when layout is NCW. For other layouts shape is permuted accordingly.

Outputs:

out: 5D output tensor with shape (batch_size, channels, out_depth, out_height, out_width) when layout is NCDHW. out_depth, out_height and out_width are calculated as:
```
out_depth = floor((depth+2*padding[0]-pool_size[0])/strides[0])+1
out_height = floor((height+2*padding[1]-pool_size[1])/strides[1])+1
out_width = floor((width+2*padding[2]-pool_size[2])/strides[2])+1
```
When ceil_mode is True, ceil will be used instead of floor in this equation.

class mxnet.gluon.nn.conv_layers.ModulatedDeformableConvolution(channels, kernel_size=(1, 1), strides=(1, 1), padding=(0, 0), dilation=(1, 1), groups=1, num_deformable_group=1, layout='NCHW', use_bias=True, in_channels=0, activation=None, weight_initializer=None, bias_initializer='zeros', offset_weight_initializer='zeros', offset_bias_initializer='zeros', offset_use_bias=True, op_name='ModulatedDeformableConvolution', adj=None)[source]¶

Bases: HybridBlock

2-D Deformable Convolution v2 (Dai, 2018).

The modulated deformable convolution operation is described in https://arxiv.org/abs/1811.11168

Parameters:

channels (int,) – The dimensionality of the output space i.e. the number of output channels in the convolution.
kernel_size (int or tuple/list of 2 ints, (Default value = (1,1))) – Specifies the dimensions of the convolution window.
strides (int or tuple/list of 2 ints, (Default value = (1,1))) – Specifies the strides of the convolution.
padding (int or tuple/list of 2 ints, (Default value = (0,0))) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points.
dilation (int or tuple/list of 2 ints, (Default value = (1,1))) – Specifies the dilation rate to use for dilated convolution.
groups (int, (Default value = 1)) – Controls the connections between inputs and outputs. At groups=1, all inputs are convolved to all outputs. At groups=2, the operation becomes equivalent to having two convolution layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.
num_deformable_group (int, (Default value = 1)) – Number of deformable group partitions.
layout (str, (Default value = NCHW)) – Dimension ordering of data and weight. Can be ‘NCW’, ‘NWC’, ‘NCHW’, ‘NHWC’, ‘NCDHW’, ‘NDHWC’, etc. ‘N’, ‘C’, ‘H’, ‘W’, ‘D’ stands for batch, channel, height, width and depth dimensions respectively. Convolution is performed over ‘D’, ‘H’, and ‘W’ dimensions.
use_bias (bool, (Default value = True)) – Whether the layer for generating the output features uses a bias vector.
in_channels (int, (Default value = 0)) – The number of input channels to this layer. If not specified, initialization will be deferred to the first time forward is called and input channels will be inferred from the shape of input data.
activation (str, (Default value = None)) – Activation function to use. See Activation(). If you don’t specify anything, no activation is applied (ie. “linear” activation: a(x) = x).
weight_initializer (str or Initializer, (Default value = None)) – Initializer for the weight weights matrix for the convolution layer for generating the output features.
bias_initializer (str or Initializer, (Default value = zeros)) – Initializer for the bias vector for the convolution layer for generating the output features.
offset_weight_initializer (str or Initializer, (Default value = zeros)) – Initializer for the weight weights matrix for the convolution layer for generating the offset.
offset_bias_initializer (str or Initializer, (Default value = zeros),) – Initializer for the bias vector for the convolution layer for generating the offset.
offset_use_bias (bool, (Default value = True)) – Whether the layer for generating the offset uses a bias vector.
Inputs –
- data: 4D input tensor with shape (batch_size, in_channels, height, width) when layout is NCHW. For other layouts shape is permuted accordingly.

Outputs –

out: 4D output tensor with shape (batch_size, channels, out_height, out_width) when layout is NCHW. out_height and out_width are calculated as:

out_height = floor((height+2*padding[0]-dilation[0]*(kernel_size[0]-1)-1)/stride[0])+1
out_width = floor((width+2*padding[1]-dilation[1]*(kernel_size[1]-1)-1)/stride[1])+1

apply(fn)¶

Applies fn recursively to every child block as well as self.

Parameters:: fn (callable) – Function to be applied to each submodule, of form fn(block).
Return type:: this block

cast(dtype)¶

Cast this Block to use another data type.

Parameters:: dtype (str or numpy.dtype) – The new data type.

collect_params(select=None)¶

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:

model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')

or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:

model.collect_params('.*weight|.*bias')

Parameters:: select (str) – regular expressions
Return type:: The selected Dict

export(path, epoch=0, remove_amp_cast=True)¶

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

Note

When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.

Parameters:

path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.

Returns:

symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.

forward(x)[source]¶: Overrides the forward computation. Arguments must be mxnet.numpy.ndarray.

hybridize(active=True, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None)¶

Activates or deactivates HybridBlock s recursively. Has no effect on non-hybrid children.

Parameters:

active (bool, default True) – Whether to turn hybrid on or off.
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.

infer_shape(x)[source]¶: Infers shape of Parameters from inputs.

infer_type(*args)¶: Infers data type of Parameters from inputs.

initialize(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶

Initializes Parameter s of this Block and its children.

Parameters:

init (Initializer) – Global default Initializer to be used when Parameter.init() is None. Otherwise, Parameter.init() takes precedence.
device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.

load(prefix)¶

Load a model saved using the save API

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.

Parameters:: prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params

load_dict(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶

Load parameters from dict

Parameters:

param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

load_parameters(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶

Load parameters from file previously saved by save_parameters.

Parameters:

filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

References

Saving and Loading Gluon Models

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.

Examples

# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)

# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)

Parameters:

x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

property params¶: Returns this Block’s parameter dictionary (does not include its children’s parameters).

pre_infer_offset_weight()[source]¶: Pre-infer the shape of offsite weight parameter based on kernel size, group size and offset channels

pre_infer_weight()[source]¶: Pre-infer the shape of weight parameter based on kernel size, group size and channels

register_child(block, name=None)¶: Registers block as a child of self. Block s assigned to self as attributes will be registered automatically.

register_forward_hook(hook)¶

Registers a forward hook on the block.

The hook function is called immediately after forward(). It should not modify the input or output.

Parameters:: hook (callable) – The forward hook function of form hook(block, input, output) -> None.
Return type:: mxnet.gluon.utils.HookHandle

register_forward_pre_hook(hook)¶

Registers a forward pre-hook on the block.

The hook function is called immediately before forward(). It should not modify the input or output.

Parameters:: hook (callable) – The forward hook function of form hook(block, input) -> None.
Return type:: mxnet.gluon.utils.HookHandle

register_op_hook(callback, monitor_all=False)¶

Install op hook for block recursively.

Parameters:

callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.

reset_ctx(ctx)¶: This function has been deprecated. Please refer to HybridBlock.reset_device.

reset_device(device)¶

Re-assign all Parameters to other devices. If the Block is hybridized, it will reset the _cached_op_args.

Parameters:: device (Device or list of Device, default device.current_device().) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.

save(prefix)¶

Save the model architecture and parameters to load again later

Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.

Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.

Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.

Parameters:: prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params

save_parameters(filename, deduplicate=False)¶

Save parameters to file.

Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use HybridBlock.export().

Parameters:

filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.

References

Saving and Loading Gluon Models

setattr(name, value)¶

Set an attribute to a new value for all Parameters.

For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:

model.setattr('grad_req', 'null')

or change the learning rate multiplier:

model.setattr('lr_mult', 0.5)

Parameters:

name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.

share_parameters(shared)¶

Share parameters recursively inside the model.

For example, if you want dense1 to share dense0’s weights, you can do:

dense0 = nn.Dense(20)
dense1 = nn.Dense(20)
dense1.share_parameters(dense0.collect_params())

which equals to: dense1.weight = dense0.weight dense1.bias = dense0.bias

Parameters:: shared (Dict) – Dict of the shared parameters.
Return type:: this block

summary(*inputs)¶

Print the summary of the model’s output and parameters.

The network must have been initialized, and must not have been hybridized.

Parameters:: inputs (object) – Any input that the model supports. For any tensor in the input, only mxnet.ndarray.NDArray is supported.

zero_grad()¶: Sets all Parameters’ gradient buffer to 0.

class mxnet.gluon.nn.conv_layers.PixelShuffle1D(factor)[source]¶

Bases: HybridBlock

Pixel-shuffle layer for upsampling in 1 dimension.

Pixel-shuffling is the operation of taking groups of values along the channel dimension and regrouping them into blocks of pixels along the W dimension, thereby effectively multiplying that dimension by a constant factor in size.

For example, a feature map of shape \((fC, W)\) is reshaped into \((C, fW)\) by forming little value groups of size \(f\) and arranging them in a grid of size \(W\).

Parameters:

factor (int or 1-tuple of int) – Upsampling factor, applied to the W dimension.
Inputs –
- data: Tensor of shape (N, f*C, W).
Outputs –
- out: Tensor of shape (N, C, W*f).

Examples

>>> pxshuf = PixelShuffle1D(2)
>>> x = mx.np.zeros((1, 8, 3))
>>> pxshuf(x).shape
(1, 4, 6)

apply(fn)¶

Applies fn recursively to every child block as well as self.

Parameters:: fn (callable) – Function to be applied to each submodule, of form fn(block).
Return type:: this block

cast(dtype)¶

Cast this Block to use another data type.

Parameters:: dtype (str or numpy.dtype) – The new data type.

collect_params(select=None)¶

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:

model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')

or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:

model.collect_params('.*weight|.*bias')

Parameters:: select (str) – regular expressions
Return type:: The selected Dict

export(path, epoch=0, remove_amp_cast=True)¶

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

Note

When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.

Parameters:

path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.

Returns:

symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.

forward(x)[source]¶: Perform pixel-shuffling on the input.

hybridize(active=True, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None)¶

Activates or deactivates HybridBlock s recursively. Has no effect on non-hybrid children.

Parameters:

active (bool, default True) – Whether to turn hybrid on or off.
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.

infer_shape(*args)¶: Infers shape of Parameters from inputs.

infer_type(*args)¶: Infers data type of Parameters from inputs.

initialize(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶

Initializes Parameter s of this Block and its children.

Parameters:

init (Initializer) – Global default Initializer to be used when Parameter.init() is None. Otherwise, Parameter.init() takes precedence.
device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.

load(prefix)¶

Load a model saved using the save API

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.

Parameters:: prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params

load_dict(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶

Load parameters from dict

Parameters:

param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

load_parameters(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶

Load parameters from file previously saved by save_parameters.

Parameters:

filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

References

Saving and Loading Gluon Models

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.

Examples

# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)

# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)

Parameters:

x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

property params¶: Returns this Block’s parameter dictionary (does not include its children’s parameters).

register_child(block, name=None)¶: Registers block as a child of self. Block s assigned to self as attributes will be registered automatically.

register_forward_hook(hook)¶

Registers a forward hook on the block.

The hook function is called immediately after forward(). It should not modify the input or output.

Parameters:: hook (callable) – The forward hook function of form hook(block, input, output) -> None.
Return type:: mxnet.gluon.utils.HookHandle

register_forward_pre_hook(hook)¶

Registers a forward pre-hook on the block.

The hook function is called immediately before forward(). It should not modify the input or output.

Parameters:: hook (callable) – The forward hook function of form hook(block, input) -> None.
Return type:: mxnet.gluon.utils.HookHandle

register_op_hook(callback, monitor_all=False)¶

Install op hook for block recursively.

Parameters:

callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.

reset_ctx(ctx)¶: This function has been deprecated. Please refer to HybridBlock.reset_device.

reset_device(device)¶

Re-assign all Parameters to other devices. If the Block is hybridized, it will reset the _cached_op_args.

Parameters:: device (Device or list of Device, default device.current_device().) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.

save(prefix)¶

Save the model architecture and parameters to load again later

Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.

Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.

Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.

Parameters:: prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params

save_parameters(filename, deduplicate=False)¶

Save parameters to file.

Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use HybridBlock.export().

Parameters:

filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.

References

Saving and Loading Gluon Models

setattr(name, value)¶

Set an attribute to a new value for all Parameters.

For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:

model.setattr('grad_req', 'null')

or change the learning rate multiplier:

model.setattr('lr_mult', 0.5)

Parameters:

name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.

share_parameters(shared)¶

Share parameters recursively inside the model.

For example, if you want dense1 to share dense0’s weights, you can do:

dense0 = nn.Dense(20)
dense1 = nn.Dense(20)
dense1.share_parameters(dense0.collect_params())

which equals to: dense1.weight = dense0.weight dense1.bias = dense0.bias

Parameters:: shared (Dict) – Dict of the shared parameters.
Return type:: this block

summary(*inputs)¶

Print the summary of the model’s output and parameters.

The network must have been initialized, and must not have been hybridized.

Parameters:: inputs (object) – Any input that the model supports. For any tensor in the input, only mxnet.ndarray.NDArray is supported.

zero_grad()¶: Sets all Parameters’ gradient buffer to 0.

class mxnet.gluon.nn.conv_layers.PixelShuffle2D(factor)[source]¶

Bases: HybridBlock

Pixel-shuffle layer for upsampling in 2 dimensions.

Pixel-shuffling is the operation of taking groups of values along the channel dimension and regrouping them into blocks of pixels along the H and W dimensions, thereby effectively multiplying those dimensions by a constant factor in size.

For example, a feature map of shape \((f^2 C, H, W)\) is reshaped into \((C, fH, fW)\) by forming little \(f \times f\) blocks of pixels and arranging them in an \(H \times W\) grid.

Pixel-shuffling together with regular convolution is an alternative, learnable way of upsampling an image by arbitrary factors. It is reported to help overcome checkerboard artifacts that are common in upsampling with transposed convolutions (also called deconvolutions). See the paper Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network for further details.

Parameters:

factor (int or 2-tuple of int) – Upsampling factors, applied to the H and W dimensions, in that order.
Inputs –
- data: Tensor of shape (N, f1*f2*C, H, W).
Outputs –
- out: Tensor of shape (N, C, H*f1, W*f2).

Examples

>>> pxshuf = PixelShuffle2D((2, 3))
>>> x = mx.np.zeros((1, 12, 3, 5))
>>> pxshuf(x).shape
(1, 2, 6, 15)

apply(fn)¶

Applies fn recursively to every child block as well as self.

Parameters:: fn (callable) – Function to be applied to each submodule, of form fn(block).
Return type:: this block

cast(dtype)¶

Cast this Block to use another data type.

Parameters:: dtype (str or numpy.dtype) – The new data type.

collect_params(select=None)¶

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:

model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')

or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:

model.collect_params('.*weight|.*bias')

Parameters:: select (str) – regular expressions
Return type:: The selected Dict

export(path, epoch=0, remove_amp_cast=True)¶

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

Note

When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.

Parameters:

path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.

Returns:

symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.

forward(x)[source]¶: Perform pixel-shuffling on the input.

hybridize(active=True, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None)¶

Activates or deactivates HybridBlock s recursively. Has no effect on non-hybrid children.

Parameters:

active (bool, default True) – Whether to turn hybrid on or off.
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.

infer_shape(*args)¶: Infers shape of Parameters from inputs.

infer_type(*args)¶: Infers data type of Parameters from inputs.

initialize(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶

Initializes Parameter s of this Block and its children.

Parameters:

init (Initializer) – Global default Initializer to be used when Parameter.init() is None. Otherwise, Parameter.init() takes precedence.
device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.

load(prefix)¶

Load a model saved using the save API

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.

Parameters:: prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params

load_dict(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶

Load parameters from dict

Parameters:

param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

load_parameters(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶

Load parameters from file previously saved by save_parameters.

Parameters:

filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

References

Saving and Loading Gluon Models

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.

Examples

# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)

# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)

Parameters:

x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

property params¶: Returns this Block’s parameter dictionary (does not include its children’s parameters).

register_child(block, name=None)¶: Registers block as a child of self. Block s assigned to self as attributes will be registered automatically.

register_forward_hook(hook)¶

Registers a forward hook on the block.

The hook function is called immediately after forward(). It should not modify the input or output.

Parameters:: hook (callable) – The forward hook function of form hook(block, input, output) -> None.
Return type:: mxnet.gluon.utils.HookHandle

register_forward_pre_hook(hook)¶

Registers a forward pre-hook on the block.

The hook function is called immediately before forward(). It should not modify the input or output.

Parameters:: hook (callable) – The forward hook function of form hook(block, input) -> None.
Return type:: mxnet.gluon.utils.HookHandle

register_op_hook(callback, monitor_all=False)¶

Install op hook for block recursively.

Parameters:

callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.

reset_ctx(ctx)¶: This function has been deprecated. Please refer to HybridBlock.reset_device.

reset_device(device)¶

Re-assign all Parameters to other devices. If the Block is hybridized, it will reset the _cached_op_args.

Parameters:: device (Device or list of Device, default device.current_device().) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.

save(prefix)¶

Save the model architecture and parameters to load again later

Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.

Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.

Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.

Parameters:: prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params

save_parameters(filename, deduplicate=False)¶

Save parameters to file.

Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use HybridBlock.export().

Parameters:

filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.

References

Saving and Loading Gluon Models

setattr(name, value)¶

Set an attribute to a new value for all Parameters.

For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:

model.setattr('grad_req', 'null')

or change the learning rate multiplier:

model.setattr('lr_mult', 0.5)

Parameters:

name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.

share_parameters(shared)¶

Share parameters recursively inside the model.

For example, if you want dense1 to share dense0’s weights, you can do:

dense0 = nn.Dense(20)
dense1 = nn.Dense(20)
dense1.share_parameters(dense0.collect_params())

which equals to: dense1.weight = dense0.weight dense1.bias = dense0.bias

Parameters:: shared (Dict) – Dict of the shared parameters.
Return type:: this block

summary(*inputs)¶

Print the summary of the model’s output and parameters.

The network must have been initialized, and must not have been hybridized.

Parameters:: inputs (object) – Any input that the model supports. For any tensor in the input, only mxnet.ndarray.NDArray is supported.

zero_grad()¶: Sets all Parameters’ gradient buffer to 0.

class mxnet.gluon.nn.conv_layers.PixelShuffle3D(factor)[source]¶

Bases: HybridBlock

Pixel-shuffle layer for upsampling in 3 dimensions.

Pixel-shuffling (or voxel-shuffling in 3D) is the operation of taking groups of values along the channel dimension and regrouping them into blocks of voxels along the D, H and W dimensions, thereby effectively multiplying those dimensions by a constant factor in size.

For example, a feature map of shape \((f^3 C, D, H, W)\) is reshaped into \((C, fD, fH, fW)\) by forming little \(f \times f \times f\) blocks of voxels and arranging them in a \(D \times H \times W\) grid.

Parameters:

factor (int or 3-tuple of int) – Upsampling factors, applied to the D, H and W dimensions, in that order.
Inputs –
- data: Tensor of shape (N, f1*f2*f3*C, D, H, W).
Outputs –
- out: Tensor of shape (N, C, D*f1, H*f2, W*f3).

Examples

>>> pxshuf = PixelShuffle3D((2, 3, 4))
>>> x = mx.np.zeros((1, 48, 3, 5, 7))
>>> pxshuf(x).shape
(1, 2, 6, 15, 28)

apply(fn)¶

Applies fn recursively to every child block as well as self.

Parameters:: fn (callable) – Function to be applied to each submodule, of form fn(block).
Return type:: this block

cast(dtype)¶

Cast this Block to use another data type.

Parameters:: dtype (str or numpy.dtype) – The new data type.

collect_params(select=None)¶

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:

model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')

or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:

model.collect_params('.*weight|.*bias')

Parameters:: select (str) – regular expressions
Return type:: The selected Dict

export(path, epoch=0, remove_amp_cast=True)¶

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

Note

When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.

Parameters:

path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.

Returns:

symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.

forward(x)[source]¶: Perform pixel-shuffling on the input.

hybridize(active=True, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None)¶

Activates or deactivates HybridBlock s recursively. Has no effect on non-hybrid children.

Parameters:

active (bool, default True) – Whether to turn hybrid on or off.
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.

infer_shape(*args)¶: Infers shape of Parameters from inputs.

infer_type(*args)¶: Infers data type of Parameters from inputs.

initialize(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶

Initializes Parameter s of this Block and its children.

Parameters:

init (Initializer) – Global default Initializer to be used when Parameter.init() is None. Otherwise, Parameter.init() takes precedence.
device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.

load(prefix)¶

Load a model saved using the save API

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.

Parameters:: prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params

load_dict(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶

Load parameters from dict

Parameters:

param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

load_parameters(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶

Load parameters from file previously saved by save_parameters.

Parameters:

filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

References

Saving and Loading Gluon Models

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.

Examples

# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)

# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)

Parameters:

x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

property params¶: Returns this Block’s parameter dictionary (does not include its children’s parameters).

register_child(block, name=None)¶: Registers block as a child of self. Block s assigned to self as attributes will be registered automatically.

register_forward_hook(hook)¶

Registers a forward hook on the block.

The hook function is called immediately after forward(). It should not modify the input or output.

Parameters:: hook (callable) – The forward hook function of form hook(block, input, output) -> None.
Return type:: mxnet.gluon.utils.HookHandle

register_forward_pre_hook(hook)¶

Registers a forward pre-hook on the block.

The hook function is called immediately before forward(). It should not modify the input or output.

Parameters:: hook (callable) – The forward hook function of form hook(block, input) -> None.
Return type:: mxnet.gluon.utils.HookHandle

register_op_hook(callback, monitor_all=False)¶

Install op hook for block recursively.

Parameters:

callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.

reset_ctx(ctx)¶: This function has been deprecated. Please refer to HybridBlock.reset_device.

reset_device(device)¶

Re-assign all Parameters to other devices. If the Block is hybridized, it will reset the _cached_op_args.

Parameters:: device (Device or list of Device, default device.current_device().) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.

save(prefix)¶

Save the model architecture and parameters to load again later

Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.

Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.

Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.

Parameters:: prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params

save_parameters(filename, deduplicate=False)¶

Save parameters to file.

Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use HybridBlock.export().

Parameters:

filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.

References

Saving and Loading Gluon Models

setattr(name, value)¶

Set an attribute to a new value for all Parameters.

For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:

model.setattr('grad_req', 'null')

or change the learning rate multiplier:

model.setattr('lr_mult', 0.5)

Parameters:

name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.

share_parameters(shared)¶

Share parameters recursively inside the model.

For example, if you want dense1 to share dense0’s weights, you can do:

dense0 = nn.Dense(20)
dense1 = nn.Dense(20)
dense1.share_parameters(dense0.collect_params())

which equals to: dense1.weight = dense0.weight dense1.bias = dense0.bias

Parameters:: shared (Dict) – Dict of the shared parameters.
Return type:: this block

summary(*inputs)¶

Print the summary of the model’s output and parameters.

The network must have been initialized, and must not have been hybridized.

Parameters:: inputs (object) – Any input that the model supports. For any tensor in the input, only mxnet.ndarray.NDArray is supported.

zero_grad()¶: Sets all Parameters’ gradient buffer to 0.

class mxnet.gluon.nn.conv_layers.ReflectionPad2D(padding=0, **kwargs)[source]¶

Bases: HybridBlock

Pads the input tensor using the reflection of the input boundary.

Parameters:: padding (int) – An integer padding size

Inputs:

data: input tensor with the shape \((N, C, H_{in}, W_{in})\).

Outputs:

out: output tensor with the shape \((N, C, H_{out}, W_{out})\), where

\[ \begin{align}\begin{aligned}H_{out} = H_{in} + 2 \cdot padding\\W_{out} = W_{in} + 2 \cdot padding\end{aligned}\end{align} \]

Examples

>>> m = nn.ReflectionPad2D(3)
>>> input = mx.np.random.normal(size=(16, 3, 224, 224))
>>> output = m(input)

apply(fn)¶

Applies fn recursively to every child block as well as self.

Parameters:: fn (callable) – Function to be applied to each submodule, of form fn(block).
Return type:: this block

cast(dtype)¶

Cast this Block to use another data type.

Parameters:: dtype (str or numpy.dtype) – The new data type.

collect_params(select=None)¶

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:

model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')

or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:

model.collect_params('.*weight|.*bias')

Parameters:: select (str) – regular expressions
Return type:: The selected Dict

export(path, epoch=0, remove_amp_cast=True)¶

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

Note

When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.

Parameters:

path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.

Returns:

symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.

forward(x)[source]¶: Use pad operator in numpy extension module, which has backward support for reflect mode

hybridize(active=True, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None)¶

Activates or deactivates HybridBlock s recursively. Has no effect on non-hybrid children.

Parameters:

active (bool, default True) – Whether to turn hybrid on or off.
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.

infer_shape(*args)¶: Infers shape of Parameters from inputs.

infer_type(*args)¶: Infers data type of Parameters from inputs.

initialize(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶

Initializes Parameter s of this Block and its children.

Parameters:

init (Initializer) – Global default Initializer to be used when Parameter.init() is None. Otherwise, Parameter.init() takes precedence.
device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.

load(prefix)¶

Load a model saved using the save API

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.

Parameters:: prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params

load_dict(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶

Load parameters from dict

Parameters:

param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

load_parameters(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶

Load parameters from file previously saved by save_parameters.

Parameters:

filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

References

Saving and Loading Gluon Models

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.

Examples

# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)

# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)

Parameters:

x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

property params¶: Returns this Block’s parameter dictionary (does not include its children’s parameters).

register_child(block, name=None)¶: Registers block as a child of self. Block s assigned to self as attributes will be registered automatically.

register_forward_hook(hook)¶

Registers a forward hook on the block.

The hook function is called immediately after forward(). It should not modify the input or output.

Parameters:: hook (callable) – The forward hook function of form hook(block, input, output) -> None.
Return type:: mxnet.gluon.utils.HookHandle

register_forward_pre_hook(hook)¶

Registers a forward pre-hook on the block.

The hook function is called immediately before forward(). It should not modify the input or output.

Parameters:: hook (callable) – The forward hook function of form hook(block, input) -> None.
Return type:: mxnet.gluon.utils.HookHandle

register_op_hook(callback, monitor_all=False)¶

Install op hook for block recursively.

Parameters:

callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.

reset_ctx(ctx)¶: This function has been deprecated. Please refer to HybridBlock.reset_device.

reset_device(device)¶

Re-assign all Parameters to other devices. If the Block is hybridized, it will reset the _cached_op_args.

Parameters:: device (Device or list of Device, default device.current_device().) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.

save(prefix)¶

Save the model architecture and parameters to load again later

Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.

Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.

Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.

Parameters:: prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params

save_parameters(filename, deduplicate=False)¶

Save parameters to file.

Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use HybridBlock.export().

Parameters:

filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.

References

Saving and Loading Gluon Models

setattr(name, value)¶

Set an attribute to a new value for all Parameters.

For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:

model.setattr('grad_req', 'null')

or change the learning rate multiplier:

model.setattr('lr_mult', 0.5)

Parameters:

name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.

share_parameters(shared)¶

Share parameters recursively inside the model.

For example, if you want dense1 to share dense0’s weights, you can do:

dense0 = nn.Dense(20)
dense1 = nn.Dense(20)
dense1.share_parameters(dense0.collect_params())

which equals to: dense1.weight = dense0.weight dense1.bias = dense0.bias

Parameters:: shared (Dict) – Dict of the shared parameters.
Return type:: this block

summary(*inputs)¶

Print the summary of the model’s output and parameters.

The network must have been initialized, and must not have been hybridized.

Parameters:: inputs (object) – Any input that the model supports. For any tensor in the input, only mxnet.ndarray.NDArray is supported.

zero_grad()¶: Sets all Parameters’ gradient buffer to 0.