mxnet.symbol.numpy_extension¶

Module for the ops not belonging to the official numpy package.

Functions

`activation`([data, act_type, name, attr, out])	Applies an activation function element-wise to the input.
`add_n`(args, *kwargs)	Adds all input arguments element-wise.
`arange_like`([data, start, step, repeat, ...])	Return an array with evenly spaced values.
`batch_dot`([lhs, rhs, transpose_a, ...])	Batchwise dot product.
`batch_flatten`([data, name, attr, out])	Flattens the input array into a 2-D array by collapsing the higher dimensions. .. note:: Flatten is deprecated. Use flatten instead. For an input array with shape `(d1, d2, ..., dk)`, flatten operation reshapes the input array into an output array of shape `(d1, d2...dk)`. Note that the behavior of this function is different from numpy.ndarray.flatten, which behaves similar to mxnet.ndarray.reshape((-1,)). Example::.
`batch_norm`([data, gamma, beta, moving_mean, ...])	Batch normalization.
`bipartite_matching`([data, is_ascend, ...])	Compute bipartite matching.
`box_decode`([data, anchors, std0, std1, ...])	Decode bounding boxes training target with normalized center offsets.
`box_encode`([samples, matches, anchors, ...])	Encode bounding boxes training target with normalized center offsets.
`box_iou`([lhs, rhs, format, name, attr, out])	Bounding box overlap of two arrays.
`box_nms`([data, overlap_thresh, ...])	Apply non-maximum suppression to input.
`broadcast_greater`([lhs, rhs, name, attr, out])	Returns the result of element-wise greater than (>) comparison operation with broadcasting.
`broadcast_like`([lhs, rhs, lhs_axes, ...])	Broadcasts lhs to have the same shape as rhs.
`cast`([data, dtype, name, attr, out])	Casts all elements of the input to a new type.
`cond`(data, *kwargs)	Run a if-then-else using user-defined condition and computation
`constraint_check`([input, msg, name, attr, out])	This operator will check if all the elements in a boolean tensor is true.
`contrib_calibrate_entropy`([hist, ...])	Provide calibrated min/max for input histogram.
`contrib_quantize`([data, min_range, ...])	Quantize a input tensor from float to out_type, with user-specified min_range and max_range.
`contrib_quantize_v2`([data, out_type, ...])	Quantize a input tensor from float to out_type, with user-specified min_calib_range and max_calib_range or the input range collected at runtime.
`contrib_quantized_rnn`([data, parameters, ...])	RNN operator for input data type of uint8.
`convolution`([data, weight, bias, kernel, ...])	Compute N-D convolution on (N+2)-D input.
`ctc_loss`([data, label, data_lengths, ...])	Connectionist Temporal Classification Loss.
`deconvolution`([data, weight, bias, kernel, ...])	Computes 1D, 2D or 3D transposed convolution (aka fractionally strided convolution) of the input tensor.
`deformable_convolution`([data, offset, ...])	Compute 2-D deformable convolution on 4-D input.
`digamma`([data, name, attr, out])	Returns element-wise log derivative of the gamma function of the input.
`dropout`([data, p, mode, axes, cudnn_off, ...])	Applies dropout operation to input array.
`embedding`([data, weight, input_dim, ...])	Maps integer indices to vector representations (embeddings).
`erf`([data, name, attr, out])	Returns element-wise gauss error function of the input.
`erfinv`([data, name, attr, out])	Returns element-wise inverse gauss error function of the input.
`foreach`(data, *kwargs)	Run a for loop over an ndarray with user-defined computation
`fully_connected`([data, weight, bias, ...])	Applies a linear transformation: \(Y = XW^T + b\).
`gamma`([data, name, attr, out])	Returns the gamma function (extension of the factorial function to the reals), computed element-wise on the input array.
`gammaln`([data, name, attr, out])	Returns element-wise log of the absolute value of the gamma function of the input.
`gather_nd`([data, indices, name, attr, out])	Gather elements or slices from data and store to a tensor whose shape is defined by indices.
`group_norm`([data, gamma, beta, num_groups, ...])	Group normalization.
`index_add`([a, ind, val, name, attr, out])	Add values to input according to given indexes.
`index_update`([a, ind, val, name, attr, out])	Update values to input according to given indexes.
`instance_norm`([data, gamma, beta, eps, ...])	Applies instance normalization to the n-dimensional input array.
`interleaved_matmul_encdec_qk`([queries, ...])	Compute the matrix multiplication between the projections of queries and keys in multihead attention use as encoder-decoder.
`interleaved_matmul_encdec_valatt`([...])	Compute the matrix multiplication between the projections of values and the attention weights in multihead attention use as encoder-decoder.
`interleaved_matmul_selfatt_qk`([...])	Compute the matrix multiplication between the projections of queries and keys in multihead attention use as self attention.
`interleaved_matmul_selfatt_valatt`([...])	Compute the matrix multiplication between the projections of values and the attention weights in multihead attention use as self attention.
`intgemm_fully_connected`([data, weight, ...])	Multiply matrices using 8-bit integers.
`intgemm_maxabsolute`([data, name, attr, out])	Compute the maximum absolute value in a tensor of float32 fast on a CPU.
`intgemm_prepare_data`([data, maxabs, name, ...])	This operator converts quantizes float32 to int8 while also banning -128.
`intgemm_prepare_weight`([weight, maxabs, ...])	This operator converts a weight matrix in column-major format to intgemm's internal fast representation of weight matrices.
`intgemm_take_weight`([weight, indices, name, ...])	Index a weight matrix stored in intgemm's weight format.
`layer_norm`([data, gamma, beta, axis, eps, ...])	Layer normalization.
`leaky_relu`([data, gamma, act_type, slope, ...])	Applies Leaky rectified linear unit activation element-wise to the input.
`log_softmax`([data, axis, temperature, ...])	Computes the log softmax of the input.
`masked_log_softmax`([data, mask, axis, ...])	Computes the masked log softmax of the input.
`masked_softmax`([data, mask, axis, ...])	Applies the softmax function masking elements according to the mask provided
`modulated_deformable_convolution`([data, ...])	Compute 2-D modulated deformable convolution on 4-D input.
`multibox_detection`([cls_prob, loc_pred, ...])	Convert multibox detection predictions.
`multibox_prior`([data, sizes, ratios, clip, ...])	Generate prior(anchor) boxes from data, sizes and ratios.
`multibox_target`([anchor, label, cls_pred, ...])	Compute Multibox training targets
`nonzero`([x, name, attr, out])	Return the indices of the elements that are non-zero.
`norm`([data, ord, axis, out_dtype, keepdims, ...])	Computes the norm on an ndarray.
`one_hot`([indices, depth, on_value, ...])	Returns a one-hot array.
`pad`([data, mode, pad_width, constant_value, ...])	Pads an input array with a constant or edge values of the array.
`pick`([data, index, axis, keepdims, mode, ...])	Picks elements from an input array according to the input indices along the given axis.
`pooling`([data, kernel, pool_type, ...])	Performs pooling on the input.
`quantized_act`([data, min_data, max_data, ...])	Activation operator for input and output data type of int8.
`quantized_conv`([data, weight, bias, ...])	Convolution operator for input, weight and bias data type of int8, and accumulates in type int32 for the output.
`quantized_elemwise_add`([lhs, rhs, lhs_min, ...])	elemwise_add operator for input dataA and input dataB data type of int8, and accumulates in type int32 for the output.
`quantized_elemwise_mul`([lhs, rhs, lhs_min, ...])	Multiplies arguments int8 element-wise.
`quantized_embedding`([data, weight, ...])	Maps integer indices to int8 vector representations (embeddings).
`quantized_flatten`([data, min_data, ...])
`quantized_fully_connected`([data, weight, ...])	Fully Connected operator for input, weight and bias data type of int8, and accumulates in type int32 for the output.
`quantized_npi_add`([lhs, rhs, lhs_min, ...])	elemwise_add operator for input dataA and input dataB data type of int8, and accumulates in type int32 for the output.
`quantized_pooling`([data, min_data, ...])	Pooling operator for input and output data type of int8.
`quantized_reshape`([data, min_data, ...])
`quantized_transpose`([data, min_data, ...])
`relu`([data, name, attr, out])	Computes rectified linear activation.
`requantize`([data, min_range, max_range, ...])	Given data that is quantized in int32 and the corresponding thresholds, requantize the data into int8 using min and max thresholds either calculated at runtime or from calibration.
`reshape`([a, newshape, reverse, order, name, ...])	Gives a new shape to an array without changing its data.
`reshape_like`([lhs, rhs, lhs_begin, lhs_end, ...])	Reshape some or all dimensions of lhs to have the same shape as some or all dimensions of rhs.
`rnn`([data, parameters, state, state_cell, ...])	Applies recurrent layers to input data.
`roi_pooling`([data, rois, pooled_size, ...])	Performs region of interest(ROI) pooling on the input array.
`round_ste`([data, name, attr, out])	Straight-through-estimator of round().
`scalar_poisson`([lam, shape, ctx, dtype, ...])	Draw random samples from a Poisson distribution.
`sequence_last`([data, sequence_length, ...])	Takes the last element of a sequence.
`sequence_mask`([data, sequence_length, ...])	Sets all elements outside the sequence to a constant value.
`sequence_reverse`([data, sequence_length, ...])	Reverses the elements of each sequence.
`shape_array`([data, name, attr, out])	Returns a 1D int64 array containing the shape of data.
`sigmoid`([data, name, attr, out])	Computes sigmoid of x element-wise.
`sign_ste`([data, name, attr, out])	Straight-through-estimator of sign().
`sldwin_atten_context`([score, value, ...])	Compute the context vector for sliding window attention, used in Longformer (https://arxiv.org/pdf/2004.05150.pdf).
`sldwin_atten_mask_like`([score, dilation, ...])	Compute the mask for the sliding window attention score, used in Longformer (https://arxiv.org/pdf/2004.05150.pdf).
`sldwin_atten_score`([query, key, dilation, ...])	Compute the sliding window attention score, which is used in Longformer (https://arxiv.org/pdf/2004.05150.pdf).
`slice`([data, begin, end, step, name, attr, out])	Slices a region of the array.
`slice_channel`([data, num_outputs, axis, ...])	Splits an array along a particular axis into multiple sub-arrays.
`slice_like`([data, shape_like, axes, name, ...])	Slices a region of the array like the shape of another array. This function is similar to `slice`, however, the begin are always 0`s and `end of specific axes are inferred from the second input shape_like. Given the second shape_like input of `shape=(d_0, d_1, ..., d_n-1)`, a `slice_like` operator with default empty axes, it performs the following operation: `` out = slice(input, begin=(0, 0, ..., 0), end=(d_0, d_1, ..., d_n-1))``. When axes is not empty, it is used to speficy which axes are being sliced. Given a 4-d input data, `slice_like` operator with `axes=(0, 2, -1)` will perform the following operation: `` out = slice(input, begin=(0, 0, 0, 0), end=(d_0, None, d_2, d_3))``. Note that it is allowed to have first and second input with different dimensions, however, you have to make sure the axes are specified and not exceeding the dimension limits. For example, given input_1 with `shape=(2,3,4,5)` and input_2 with `shape=(1,2,3)`, it is not allowed to use: `` out = slice_like(a, b)`` because ndim of input_1 is 4, and ndim of input_2 is 3. The following is allowed in this situation: `` out = slice_like(a, b, axes=(0, 2))`` Example::.
`smooth_l1`([data, scalar, name, attr, out])	Calculate Smooth L1 Loss(lhs, scalar) by summing
`softmax`([data, length, axis, temperature, ...])	Applies the softmax function.
`softsign`([data, name, attr, out])	Computes softsign of x element-wise.
`stop_gradient`([data, name, attr, out])	Stops gradient computation.
`sync_batch_norm`([data, gamma, beta, ...])	Batch normalization.
`tensor_poisson`([lam, shape, dtype, name, ...])	Concurrent sampling from multiple Poisson distributions with parameters lambda (rate).
`topk`([data, axis, k, ret_typ, is_ascend, ...])	Returns the indices of the top k elements in an input array along the given
`while_loop`(data, *kwargs)	Run a while loop over with user-defined condition and computation

mxnet.symbol.numpy_extension.activation(data=None, act_type=_Null, name=None, attr=None, out=None, **kwargs)¶

Applies an activation function element-wise to the input.

The following activation functions are supported:

relu: Rectified Linear Unit, \(y = max(x, 0)\)
sigmoid: \(y = \frac{1}{1 + exp(-x)}\)
log_sigmoid: \(y = log(\frac{1}{1 + exp(-x)})\)
mish: \(y = x * tanh(log(1 + exp(x)))\)
tanh: Hyperbolic tangent, \(y = \frac{exp(x) - exp(-x)}{exp(x) + exp(-x)}\)
softrelu: Soft ReLU, or SoftPlus, \(y = log(1 + exp(x))\)
softsign: \(y = \frac{x}{1 + abs(x)}\)

Defined in /home/smola/mxnet/src/operator/nn/activation.cc:L183

Parameters:

data (Symbol) – The input array.
act_type ({'log_sigmoid', 'mish', 'relu', 'sigmoid', 'softrelu', 'softsign', 'tanh'}, required) – Activation function to be applied.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.add_n(*args, **kwargs)¶

Adds all input arguments element-wise.

\[add\_n(a_1, a_2, ..., a_n) = a_1 + a_2 + ... + a_n\]

add_n is potentially more efficient than calling add by n times.

The storage type of add_n output depends on storage types of inputs

add_n(row_sparse, row_sparse, ..) = row_sparse
add_n(default, csr, default) = default
add_n(any input combinations longer than 4 (>4) with at least one default type) = default
otherwise, add_n falls all inputs back to default storage and generates default storage

Defined in /home/smola/mxnet/src/operator/tensor/elemwise_sum.cc:L158 This function support variable length of positional input.

Parameters:

args (Symbol[]) – Positional input arguments
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.arange_like(data=None, start=_Null, step=_Null, repeat=_Null, ctx=_Null, axis=_Null, name=None, attr=None, out=None, **kwargs)¶

Return an array with evenly spaced values. If axis is not given, the output will have the same shape as the input array. Otherwise, the output will be a 1-D array with size of the specified axis in input shape.

Examples:

x = [[0.14883883 0.7772398  0.94865847 0.7225052 ]
     [0.23729339 0.6112595  0.66538996 0.5132841 ]
     [0.30822644 0.9912457  0.15502319 0.7043658 ]]
     <ndarray 3x4 @cpu(0)>

out = mx.nd.contrib.arange_like(x, start=0)

  [[ 0.  1.  2.  3.]
   [ 4.  5.  6.  7.]
   [ 8.  9. 10. 11.]]
   <ndarray 3x4 @cpu(0)>

out = mx.nd.contrib.arange_like(x, start=0, axis=-1)

  [0. 1. 2. 3.]
  <ndarray 4 @cpu(0)>

Parameters:

data (Symbol) – The input
start (double, optional, default=0) – Start of interval. The interval includes this value. The default start value is 0.
step (double, optional, default=1) – Spacing between values.
repeat (int, optional, default='1') – The repeating time of all elements. E.g repeat=3, the element a will be repeated three times –> a, a, a.
ctx (string, optional, default='') – Context of output, in format [cpu|gpu|cpu_pinned](n).Only used for imperative calls.
axis (int or None, optional, default='None') – Arange elements according to the size of a certain axis of input array. The negative numbers are interpreted counting from the backward. If not provided, will arange elements according to the input shape.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.batch_dot(lhs=None, rhs=None, transpose_a=_Null, transpose_b=_Null, forward_stype=_Null, name=None, attr=None, out=None, **kwargs)¶

Batchwise dot product.

batch_dot is used to compute dot product of x and y when x and y are data in batch, namely N-D (N >= 3) arrays in shape of (B0, …, B_i, :, :).

For example, given x with shape (B_0, …, B_i, N, M) and y with shape (B_0, …, B_i, M, K), the result array will have shape (B_0, …, B_i, N, K), which is computed by:

batch_dot(x,y)[b_0, ..., b_i, :, :] = dot(x[b_0, ..., b_i, :, :], y[b_0, ..., b_i, :, :])

Defined in /home/smola/mxnet/src/operator/tensor/dot.cc:L188

Parameters:

lhs (Symbol) – The first input
rhs (Symbol) – The second input
transpose_a (boolean, optional, default=0) – If true then transpose the first input before dot.
transpose_b (boolean, optional, default=0) – If true then transpose the second input before dot.
forward_stype ({None, 'csr', 'default', 'row_sparse'},optional, default='None') – The desired storage type of the forward output given by user, if thecombination of input storage types and this hint does not matchany implemented ones, the dot operator will perform fallback operationand still produce an output of the desired storage type.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.batch_flatten(data=None, name=None, attr=None, out=None, **kwargs)¶

Flattens the input array into a 2-D array by collapsing the higher dimensions. .. note:: Flatten is deprecated. Use flatten instead. For an input array with shape (d1, d2, ..., dk), flatten operation reshapes the input array into an output array of shape (d1, d2*...*dk). Note that the behavior of this function is different from numpy.ndarray.flatten, which behaves similar to mxnet.ndarray.reshape((-1,)). Example:

x = [[
    [1,2,3],
    [4,5,6],
    [7,8,9]
],
[    [1,2,3],
    [4,5,6],
    [7,8,9]
]],
flatten(x) = [[ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.],
   [ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.]]

Defined in /home/smola/mxnet/src/operator/tensor/matrix_op.cc:L278

Parameters:

data (Symbol) – Input array.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.batch_norm(data=None, gamma=None, beta=None, moving_mean=None, moving_var=None, eps=_Null, momentum=_Null, fix_gamma=_Null, use_global_stats=_Null, output_mean_var=_Null, axis=_Null, cudnn_off=_Null, min_calib_range=_Null, max_calib_range=_Null, name=None, attr=None, out=None, **kwargs)¶

Batch normalization.

Normalizes a data batch by mean and variance, and applies a scale gamma as well as offset beta.

Assume the input has more than one dimension and we normalize along axis 1. We first compute the mean and variance along this axis:

\[\begin{split}data\_mean[i] = mean(data[:,i,:,...]) \\ data\_var[i] = var(data[:,i,:,...])\end{split}\]

Then compute the normalized output, which has the same shape as input, as following:

\[out[:,i,:,...] = \frac{data[:,i,:,...] - data\_mean[i]}{\sqrt{data\_var[i]+\epsilon}} * gamma[i] + beta[i]\]

Both mean and var returns a scalar by treating the input as a vector.

Assume the input has size k on axis 1, then both gamma and beta have shape (k,). If output_mean_var is set to be true, then outputs both data_mean and the inverse of data_var, which are needed for the backward pass. Note that gradient of these two outputs are blocked.

Besides the inputs and the outputs, this operator accepts two auxiliary states, moving_mean and moving_var, which are k-length vectors. They are global statistics for the whole dataset, which are updated by:

moving_mean = moving_mean * momentum + data_mean * (1 - momentum)
moving_var = moving_var * momentum + data_var * (1 - momentum)

If use_global_stats is set to be true, then moving_mean and moving_var are used instead of data_mean and data_var to compute the output. It is often used during inference.

The parameter axis specifies which axis of the input shape denotes the ‘channel’ (separately normalized groups). The default is 1. Specifying -1 sets the channel axis to be the last item in the input shape.

Both gamma and beta are learnable parameters. But if fix_gamma is true, then set gamma to 1 and its gradient to 0.

Note

When fix_gamma is set to True, no sparse support is provided. If fix_gamma is set to False, the sparse tensors will fallback.

Defined in /home/smola/mxnet/src/operator/nn/batch_norm.cc:L636

Parameters:

data (Symbol) – Input data to batch normalization
gamma (Symbol) – gamma array
beta (Symbol) – beta array
moving_mean (Symbol) – running mean of input
moving_var (Symbol) – running variance of input
eps (double, optional, default=0.0010000000474974513) – Epsilon to prevent div 0. Must be no less than CUDNN_BN_MIN_EPSILON defined in cudnn.h when using cudnn (usually 1e-5)
momentum (float, optional, default=0.899999976) – Momentum for moving average
fix_gamma (boolean, optional, default=1) – Fix gamma while training
use_global_stats (boolean, optional, default=0) – Whether use global moving statistics instead of local batch-norm. This will force change batch-norm into a scale shift operator.
output_mean_var (boolean, optional, default=0) – Output the mean and inverse std
axis (int, optional, default='1') – Specify which shape axis the channel is specified
cudnn_off (boolean, optional, default=0) – Do not select CUDNN operator, if available
min_calib_range (float or None, optional, default=None) – The minimum scalar value in the form of float32 obtained through calibration. If present, it will be used to by quantized batch norm op to calculate primitive scale.Note: this calib_range is to calib bn output.
max_calib_range (float or None, optional, default=None) – The maximum scalar value in the form of float32 obtained through calibration. If present, it will be used to by quantized batch norm op to calculate primitive scale.Note: this calib_range is to calib bn output.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.bipartite_matching(data=None, is_ascend=_Null, threshold=_Null, topk=_Null, name=None, attr=None, out=None, **kwargs)¶

Compute bipartite matching.

The matching is performed on score matrix with shape [B, N, M] - B: batch_size - N: number of rows to match - M: number of columns as reference to be matched against.

Returns: x : matched column indices. -1 indicating non-matched elements in rows. y : matched row indices.

Note:

Zero gradients are back-propagated in this op for now.

Example:

s = [[0.5, 0.6], [0.1, 0.2], [0.3, 0.4]]
x, y = bipartite_matching(x, threshold=1e-12, is_ascend=False)
x = [1, -1, 0]
y = [2, 0]

Defined in /home/smola/mxnet/src/operator/contrib/bounding_box.cc:L191

Parameters:

data (Symbol) – The input
is_ascend (boolean, optional, default=0) – Use ascend order for scores instead of descending. Please set threshold accordingly.
threshold (float, required) – Ignore matching when score < thresh, if is_ascend=false, or ignore score > thresh, if is_ascend=true.
topk (int, optional, default='-1') – Limit the number of matches to topk, set -1 for no limit
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.box_decode(data=None, anchors=None, std0=_Null, std1=_Null, std2=_Null, std3=_Null, clip=_Null, format=_Null, name=None, attr=None, out=None, **kwargs)¶

Decode bounding boxes training target with normalized center offsets.: Input bounding boxes are using corner type: x_{min}, y_{min}, x_{max}, y_{max} or center type: x, y, width, height.) array

Defined in /home/smola/mxnet/src/operator/contrib/bounding_box.cc:L249

Parameters:

data (Symbol) – (B, N, 4) predicted bbox offset
anchors (Symbol) – (1, N, 4) encoded in corner or center
std0 (float, optional, default=1) – value to be divided from the 1st encoded values
std1 (float, optional, default=1) – value to be divided from the 2nd encoded values
std2 (float, optional, default=1) – value to be divided from the 3rd encoded values
std3 (float, optional, default=1) – value to be divided from the 4th encoded values
clip (float, optional, default=-1) – If larger than 0, bounding box target will be clipped to this value.
format ({'center', 'corner'},optional, default='center') –

The box encoding type.
”corner” means boxes are encoded as [xmin, ymin, xmax, ymax], “center” means boxes are encodes as [x, y, width, height].
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.box_encode(samples=None, matches=None, anchors=None, refs=None, means=None, stds=None, name=None, attr=None, out=None, **kwargs)¶

Encode bounding boxes training target with normalized center offsets.: Input bounding boxes are using corner type: x_{min}, y_{min}, x_{max}, y_{max}.) array

Defined in /home/smola/mxnet/src/operator/contrib/bounding_box.cc:L220

Parameters:

samples (Symbol) – (B, N) value +1 (positive), -1 (negative), 0 (ignore)
matches (Symbol) – (B, N) value range [0, M)
anchors (Symbol) – (B, N, 4) encoded in corner
refs (Symbol) – (B, M, 4) encoded in corner
means (Symbol) – (4,) Mean value to be subtracted from encoded values
stds (Symbol) – (4,) Std value to be divided from encoded values
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.box_iou(lhs=None, rhs=None, format=_Null, name=None, attr=None, out=None, **kwargs)¶

Bounding box overlap of two arrays.

The overlap is defined as Intersection-over-Union, aka, IOU. - lhs: (a_1, a_2, …, a_n, 4) array - rhs: (b_1, b_2, …, b_n, 4) array - output: (a_1, a_2, …, a_n, b_1, b_2, …, b_n) array

Note:

Zero gradients are back-propagated in this op for now.

Example:

x = [[0.5, 0.5, 1.0, 1.0], [0.0, 0.0, 0.5, 0.5]]
y = [[0.25, 0.25, 0.75, 0.75]]
box_iou(x, y, format='corner') = [[0.1428], [0.1428]]

Defined in /home/smola/mxnet/src/operator/contrib/bounding_box.cc:L144

Parameters:

lhs (Symbol) – The first input
rhs (Symbol) – The second input
format ({'center', 'corner'},optional, default='corner') –

The box encoding type.
”corner” means boxes are encoded as [xmin, ymin, xmax, ymax], “center” means boxes are encodes as [x, y, width, height].
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.box_nms(data=None, overlap_thresh=_Null, valid_thresh=_Null, topk=_Null, coord_start=_Null, score_index=_Null, id_index=_Null, background_id=_Null, force_suppress=_Null, in_format=_Null, out_format=_Null, name=None, attr=None, out=None, **kwargs)¶

Apply non-maximum suppression to input.

The output will be sorted in descending order according to score. Boxes with overlaps larger than overlap_thresh, smaller scores and background boxes will be removed and filled with -1, the corresponding position will be recorded for backward propogation.

During back-propagation, the gradient will be copied to the original position according to the input index. For positions that have been suppressed, the in_grad will be assigned 0. In summary, gradients are sticked to its boxes, will either be moved or discarded according to its original index in input.

Input requirements:

Input tensor have at least 2 dimensions, (n, k), any higher dims will be regarded
as batch, e.g. (a, b, c, d, n, k) == (a*b*c*d, n, k)
n is the number of boxes in each batch
k is the width of each box item.

By default, a box is [id, score, xmin, ymin, xmax, ymax, …], additional elements are allowed.

id_index: optional, use -1 to ignore, useful if force_suppress=False, which means we will skip highly overlapped boxes if one is apple while the other is car.
background_id: optional, default=-1, class id for background boxes, useful when id_index >= 0 which means boxes with background id will be filtered before nms.
coord_start: required, default=2, the starting index of the 4 coordinates. Two formats are supported:
- corner: [xmin, ymin, xmax, ymax]
- center: [x, y, width, height]
score_index: required, default=1, box score/confidence. When two boxes overlap IOU > overlap_thresh, the one with smaller score will be suppressed.
in_format and out_format: default=’corner’, specify in/out box formats.

Examples:

x = [[0, 0.5, 0.1, 0.1, 0.2, 0.2], [1, 0.4, 0.1, 0.1, 0.2, 0.2],
     [0, 0.3, 0.1, 0.1, 0.14, 0.14], [2, 0.6, 0.5, 0.5, 0.7, 0.8]]
box_nms(x, overlap_thresh=0.1, coord_start=2, score_index=1, id_index=0,
    force_suppress=True, in_format='corner', out_typ='corner') =
    [[2, 0.6, 0.5, 0.5, 0.7, 0.8], [0, 0.5, 0.1, 0.1, 0.2, 0.2],
     [-1, -1, -1, -1, -1, -1], [-1, -1, -1, -1, -1, -1]]
out_grad = [[0.1, 0.1, 0.1, 0.1, 0.1, 0.1], [0.2, 0.2, 0.2, 0.2, 0.2, 0.2],
            [0.3, 0.3, 0.3, 0.3, 0.3, 0.3], [0.4, 0.4, 0.4, 0.4, 0.4, 0.4]]
# exe.backward
in_grad = [[0.2, 0.2, 0.2, 0.2, 0.2, 0.2], [0, 0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0, 0], [0.1, 0.1, 0.1, 0.1, 0.1, 0.1]]

Defined in /home/smola/mxnet/src/operator/contrib/bounding_box.cc:L93

Parameters:

data (Symbol) – The input
overlap_thresh (float, optional, default=0.5) – Overlapping(IoU) threshold to suppress object with smaller score.
valid_thresh (float, optional, default=0) – Filter input boxes to those whose scores greater than valid_thresh.
topk (int, optional, default='-1') – Apply nms to topk boxes with descending scores, -1 to no restriction.
coord_start (int, optional, default='2') – Start index of the consecutive 4 coordinates.
score_index (int, optional, default='1') – Index of the scores/confidence of boxes.
id_index (int, optional, default='-1') – Optional, index of the class categories, -1 to disable.
background_id (int, optional, default='-1') – Optional, id of the background class which will be ignored in nms.
force_suppress (boolean, optional, default=0) – Optional, if set false and id_index is provided, nms will only apply to boxes belongs to the same category
in_format ({'center', 'corner'},optional, default='corner') –

The input box encoding type.
”corner” means boxes are encoded as [xmin, ymin, xmax, ymax], “center” means boxes are encodes as [x, y, width, height].
out_format ({'center', 'corner'},optional, default='corner') –

The output box encoding type.
”corner” means boxes are encoded as [xmin, ymin, xmax, ymax], “center” means boxes are encodes as [x, y, width, height].
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.broadcast_greater(lhs=None, rhs=None, name=None, attr=None, out=None, **kwargs)¶

Returns the result of element-wise greater than (>) comparison operation with broadcasting.

Example:

x = [[ 1.,  1.,  1.],
     [ 1.,  1.,  1.]]

y = [[ 0.],
     [ 1.]]

broadcast_greater(x, y) = [[ 1.,  1.,  1.],
                           [ 0.,  0.,  0.]]

Defined in /home/smola/mxnet/src/operator/tensor/elemwise_binary_broadcast_op_logic.cc:L84

Parameters:

lhs (Symbol) – First input to the function
rhs (Symbol) – Second input to the function
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.broadcast_like(lhs=None, rhs=None, lhs_axes=_Null, rhs_axes=_Null, name=None, attr=None, out=None, **kwargs)¶

Broadcasts lhs to have the same shape as rhs.

Broadcasting is a mechanism that allows ndarrays to perform arithmetic operations with arrays of different shapes efficiently without creating multiple copies of arrays. Also see, Broadcasting for more explanation.

Broadcasting is allowed on axes with size 1, such as from (2,1,3,1) to (2,8,3,9). Elements will be duplicated on the broadcasted axes.

For example:

broadcast_like([[1,2,3]], [[5,6,7],[7,8,9]]) = [[ 1.,  2.,  3.],
                                                [ 1.,  2.,  3.]])

broadcast_like([9], [1,2,3,4,5], lhs_axes=(0,), rhs_axes=(-1,)) = [9,9,9,9,9]

Defined in /home/smola/mxnet/src/operator/tensor/broadcast_reduce_op_value.cc:L174

Parameters:

lhs (Symbol) – First input.
rhs (Symbol) – Second input.
lhs_axes (Shape or None, optional, default=None) – Axes to perform broadcast on in the first input array
rhs_axes (Shape or None, optional, default=None) – Axes to copy from the second input array
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.cast(data=None, dtype=_Null, name=None, attr=None, out=None, **kwargs)¶

Casts all elements of the input to a new type.

Note

Cast is deprecated. Use cast instead.

Example:

cast([0.9, 1.3], dtype='int32') = [0, 1]
cast([1e20, 11.1], dtype='float16') = [inf, 11.09375]
cast([300, 11.1, 10.9, -1, -3], dtype='uint8') = [44, 11, 10, 255, 253]

Defined in /home/smola/mxnet/src/operator/tensor/elemwise_unary_op_basic.cc:L789

Parameters:

data (Symbol) – The input.
dtype ({'bfloat16', 'bool', 'float16', 'float32', 'float64', 'int16', 'int32', 'int64', 'int8', 'uint16', 'uint32', 'uint64', 'uint8'}, required) – Output data type.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.cond(*data, **kwargs)¶

Run a if-then-else using user-defined condition and computation

From:/home/smola/mxnet/src/operator/npx_control_flow.cc:1274 This function support variable length of positional input.

Parameters:

cond (Symbol) – Input graph for the condition.
then_branch (Symbol) – Input graph for the then branch.
else_branch (Symbol) – Input graph for the else branch.
data (Symbol[]) – The input arrays that include data arrays and states.
num_outputs (int, required) – The number of outputs of the subgraph.
cond_input_locs (tuple of <long>, required) – The locations of cond’s inputs in the given inputs.
then_input_locs (tuple of <long>, required) – The locations of then’s inputs in the given inputs.
else_input_locs (tuple of <long>, required) – The locations of else’s inputs in the given inputs.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.constraint_check(input=None, msg=_Null, name=None, attr=None, out=None, **kwargs)¶

This operator will check if all the elements in a boolean tensor is true. If not, ValueError exception will be raised in the backend with given error message. In order to evaluate this operator, one should multiply the origin tensor by the return value of this operator to force this operator become part of the computation graph, otherwise the check would not be working under symoblic mode.

Parameters:

x (ndarray) – A boolean tensor.
msg (string) – The error message in the exception.

Returns:

out – If all the elements in the input tensor are true, array(True) will be returned, otherwise ValueError exception would be raised before anything got returned.

Return type:

ndarray

Examples

>>> loc = np.zeros((2,2))
>>> scale = np.array(#some_value)
>>> constraint = (scale > 0)
>>> np.random.normal(loc,
                 scale * npx.constraint_check(constraint, 'Scale should be larger than zero'))

If elements in the scale tensor are all bigger than zero, npx.constraint_check would return np.array(True), which will not change the value of scale when multiplied by. If some of the elements in the scale tensor violate the constraint, i.e. there exists False in the boolean tensor constraint, a ValueError exception with given message ‘Scale should be larger than zero’ would be raised.

mxnet.symbol.numpy_extension.contrib_calibrate_entropy(hist=None, hist_edges=None, num_quantized_bins=_Null, name=None, attr=None, out=None, **kwargs)¶

Provide calibrated min/max for input histogram.

Note

This operator only supports forward propagation. DO NOT use it in training.

Defined in /home/smola/mxnet/src/operator/quantization/calibrate.cc:L207

Parameters:

hist (Symbol) – A ndarray/symbol of type float32
hist_edges (Symbol) – A ndarray/symbol of type float32
num_quantized_bins (int, optional, default='255') – The number of quantized bins.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.contrib_quantize(data=None, min_range=None, max_range=None, out_type=_Null, name=None, attr=None, out=None, **kwargs)¶

Quantize a input tensor from float to out_type, with user-specified min_range and max_range.

min_range and max_range are scalar floats that specify the range for the input data.

When out_type is uint8, the output is calculated using the following equation:

out[i] = (in[i] - min_range) * range(OUTPUT_TYPE) / (max_range - min_range) + 0.5,

where range(T) = numeric_limits<T>::max() - numeric_limits<T>::min().

When out_type is int8, the output is calculate using the following equation by keep zero centered for the quantized value:

out[i] = sign(in[i]) * min(abs(in[i] * scale + 0.5f, quantized_range),

where quantized_range = MinAbs(max(int8), min(int8)) and scale = quantized_range / MaxAbs(min_range, max_range).

Note

This operator only supports forward propagation. DO NOT use it in training.

Defined in /home/smola/mxnet/src/operator/quantization/quantize.cc:L94

Parameters:

data (Symbol) – A ndarray/symbol of type float32
min_range (Symbol) – The minimum scalar value possibly produced for the input
max_range (Symbol) – The maximum scalar value possibly produced for the input
out_type ({'int8', 'uint8'},optional, default='uint8') – Output data type.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.contrib_quantize_v2(data=None, out_type=_Null, min_calib_range=_Null, max_calib_range=_Null, name=None, attr=None, out=None, **kwargs)¶

Quantize a input tensor from float to out_type, with user-specified min_calib_range and max_calib_range or the input range collected at runtime.

Output min_range and max_range are scalar floats that specify the range for the input data.

When out_type is uint8, the output is calculated using the following equation:

out[i] = (in[i] - min_range) * range(OUTPUT_TYPE) / (max_range - min_range) + 0.5,

where range(T) = numeric_limits<T>::max() - numeric_limits<T>::min().

When out_type is int8, the output is calculate using the following equation by keep zero centered for the quantized value:

out[i] = sign(in[i]) * min(abs(in[i] * scale + 0.5f, quantized_range),

where quantized_range = MinAbs(max(int8), min(int8)) and scale = quantized_range / MaxAbs(min_range, max_range).

When out_type is auto, the output type is automatically determined by min_calib_range if presented. If min_calib_range < 0.0f, the output type will be int8, otherwise will be uint8. If min_calib_range isn’t presented, the output type will be int8.

Note

This operator only supports forward propagation. DO NOT use it in training.

Defined in /home/smola/mxnet/src/operator/quantization/quantize_v2.cc:L104

Parameters:

data (Symbol) – A ndarray/symbol of type float32
out_type ({'auto', 'int8', 'uint8'},optional, default='int8') – Output data type. auto can be specified to automatically determine output type according to min_calib_range.
min_calib_range (float or None, optional, default=None) – The minimum scalar value in the form of float32. If present, it will be used to quantize the fp32 data into int8 or uint8.
max_calib_range (float or None, optional, default=None) – The maximum scalar value in the form of float32. If present, it will be used to quantize the fp32 data into int8 or uint8.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.contrib_quantized_rnn(data=None, parameters=None, state=None, state_cell=None, data_scale=None, data_shift=None, state_size=_Null, num_layers=_Null, bidirectional=_Null, mode=_Null, p=_Null, state_outputs=_Null, projection_size=_Null, lstm_state_clip_min=_Null, lstm_state_clip_max=_Null, lstm_state_clip_nan=_Null, use_sequence_length=_Null, name=None, attr=None, out=None, **kwargs)¶

RNN operator for input data type of uint8. The weight of each gates is converted to int8, while bias is accumulated in type float32. The hidden state and cell state are in type float32. For the input data, two more arguments of type float32 must be provided representing the thresholds of quantizing argument from data type float32 to uint8. The final outputs contain the recurrent result in float32. It only supports quantization for Vanilla LSTM network.

Note

This operator only supports forward propagation. DO NOT use it in training.

Defined in /home/smola/mxnet/src/operator/quantization/quantized_rnn.cc:L320

Parameters:

data (Symbol) – Input data.
parameters (Symbol) – weight.
state (Symbol) – initial hidden state of the RNN
state_cell (Symbol) – initial cell state for LSTM networks (only for LSTM)
data_scale (Symbol) – quantization scale of data.
data_shift (Symbol) – quantization shift of data.
state_size (int (non-negative), required) – size of the state for each layer
num_layers (int (non-negative), required) – number of stacked layers
bidirectional (boolean, optional, default=0) – whether to use bidirectional recurrent layers
mode ({'gru', 'lstm', 'rnn_relu', 'rnn_tanh'}, required) – the type of RNN to compute
p (float, optional, default=0) – drop rate of the dropout on the outputs of each RNN layer, except the last layer.
state_outputs (boolean, optional, default=0) – Whether to have the states as symbol outputs.
projection_size (int or None, optional, default='None') – size of project size
lstm_state_clip_min (double or None, optional, default=None) – Minimum clip value of LSTM states. This option must be used together with lstm_state_clip_max.
lstm_state_clip_max (double or None, optional, default=None) – Maximum clip value of LSTM states. This option must be used together with lstm_state_clip_min.
lstm_state_clip_nan (boolean, optional, default=0) – Whether to stop NaN from propagating in state by clipping it to min/max. If clipping range is not specified, this option is ignored.
use_sequence_length (boolean, optional, default=0) – If set to true, this layer takes in an extra input parameter sequence_length to specify variable length sequence
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.convolution(data=None, weight=None, bias=None, kernel=_Null, stride=_Null, dilate=_Null, pad=_Null, num_filter=_Null, num_group=_Null, workspace=_Null, no_bias=_Null, cudnn_tune=_Null, cudnn_off=_Null, layout=_Null, name=None, attr=None, out=None, **kwargs)¶

Compute N-D convolution on (N+2)-D input.

In the 2-D convolution, given input data with shape (batch_size, channel, height, width), the output is computed by

\[out[n,i,:,:] = bias[i] + \sum_{j=0}^{channel} data[n,j,:,:] \star weight[i,j,:,:]\]

where \(\star\) is the 2-D cross-correlation operator.

For general 2-D convolution, the shapes are

data: (batch_size, channel, height, width)
weight: (num_filter, channel, kernel[0], kernel[1])
bias: (num_filter,)
out: (batch_size, num_filter, out_height, out_width).

Define:

f(x,k,p,s,d) = floor((x+2*p-d*(k-1)-1)/s)+1

then we have:

out_height=f(height, kernel[0], pad[0], stride[0], dilate[0])
out_width=f(width, kernel[1], pad[1], stride[1], dilate[1])

If no_bias is set to be true, then the bias term is ignored.

The default data layout is NCHW, namely (batch_size, channel, height, width). We can choose other layouts such as NWC.

If num_group is larger than 1, denoted by g, then split the input data evenly into g parts along the channel axis, and also evenly split weight along the first dimension. Next compute the convolution on the i-th part of the data with the i-th weight part. The output is obtained by concatenating all the g results.

1-D convolution does not have height dimension but only width in space.

data: (batch_size, channel, width)
weight: (num_filter, channel, kernel[0])
bias: (num_filter,)
out: (batch_size, num_filter, out_width).

3-D convolution adds an additional depth dimension besides height and width. The shapes are

data: (batch_size, channel, depth, height, width)
weight: (num_filter, channel, kernel[0], kernel[1], kernel[2])
bias: (num_filter,)
out: (batch_size, num_filter, out_depth, out_height, out_width).

Both weight and bias are learnable parameters.

There are other options to tune the performance.

cudnn_tune: enable this option leads to higher startup time but may give faster speed. Options are
- off: no tuning
- limited_workspace:run test and pick the fastest algorithm that doesn’t exceed workspace limit.
- fastest: pick the fastest algorithm and ignore workspace limit.
- None (default): the behavior is determined by environment variable MXNET_CUDNN_AUTOTUNE_DEFAULT. 0 for off, 1 for limited workspace (default), 2 for fastest.
workspace: A large number leads to more (GPU) memory usage but may improve the performance.

Defined in /home/smola/mxnet/src/operator/nn/convolution.cc:L509

Parameters:

data (Symbol) – Input data to the ConvolutionOp.
weight (Symbol) – Weight matrix.
bias (Symbol) – Bias parameter.
kernel (Shape(tuple), required) – Convolution kernel size: (w,), (h, w) or (d, h, w)
stride (Shape(tuple), optional, default=[]) – Convolution stride: (w,), (h, w) or (d, h, w). Defaults to 1 for each dimension.
dilate (Shape(tuple), optional, default=[]) – Convolution dilate: (w,), (h, w) or (d, h, w). Defaults to 1 for each dimension.
pad (Shape(tuple), optional, default=[]) – Zero pad for convolution: (w,), (h, w) or (d, h, w). Defaults to no padding.
num_filter (int (non-negative), required) – Convolution filter(channel) number
num_group (int (non-negative), optional, default=1) – Number of group partitions.
workspace (long (non-negative), optional, default=1024) – Maximum temporary workspace allowed (MB) in convolution.This parameter has two usages. When CUDNN is not used, it determines the effective batch size of the convolution kernel. When CUDNN is used, it controls the maximum temporary storage used for tuning the best CUDNN kernel when limited_workspace strategy is used.
no_bias (boolean, optional, default=0) – Whether to disable bias parameter.
cudnn_tune ({None, 'fastest', 'limited_workspace', 'off'},optional, default='None') – Whether to pick convolution algo by running performance test.
cudnn_off (boolean, optional, default=0) – Turn off cudnn for this layer.
layout ({None, 'NCDHW', 'NCHW', 'NCW', 'NDHWC', 'NHWC', 'NWC'},optional, default='None') –

Set layout for input, output and weight. Empty for
default layout: NCW for 1d, NCHW for 2d and NCDHW for 3d.NHWC and NDHWC are only supported on GPU.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.ctc_loss(data=None, label=None, data_lengths=None, label_lengths=None, use_data_lengths=_Null, use_label_lengths=_Null, blank_label=_Null, name=None, attr=None, out=None, **kwargs)¶

Connectionist Temporal Classification Loss.

Note

The existing alias contrib_CTCLoss is deprecated.

The shapes of the inputs and outputs:

data: (sequence_length, batch_size, alphabet_size)
label: (batch_size, label_sequence_length)
out: (batch_size)

The data tensor consists of sequences of activation vectors (without applying softmax), with i-th channel in the last dimension corresponding to i-th label for i between 0 and alphabet_size-1 (i.e always 0-indexed). Alphabet size should include one additional value reserved for blank label. When blank_label is "first", the 0-th channel is be reserved for activation of blank label, or otherwise if it is “last”, (alphabet_size-1)-th channel should be reserved for blank label.

label is an index matrix of integers. When blank_label is "first", the value 0 is then reserved for blank label, and should not be passed in this matrix. Otherwise, when blank_label is "last", the value (alphabet_size-1) is reserved for blank label.

If a sequence of labels is shorter than label_sequence_length, use the special padding value at the end of the sequence to conform it to the correct length. The padding value is 0 when blank_label is "first", and -1 otherwise.

For example, suppose the vocabulary is [a, b, c], and in one batch we have three sequences ‘ba’, ‘cbb’, and ‘abac’. When blank_label is "first", we can index the labels as {‘a’: 1, ‘b’: 2, ‘c’: 3}, and we reserve the 0-th channel for blank label in data tensor. The resulting label tensor should be padded to be:

[[2, 1, 0, 0], [3, 2, 2, 0], [1, 2, 1, 3]]

When blank_label is "last", we can index the labels as {‘a’: 0, ‘b’: 1, ‘c’: 2}, and we reserve the channel index 3 for blank label in data tensor. The resulting label tensor should be padded to be:

[[1, 0, -1, -1], [2, 1, 1, -1], [0, 1, 0, 2]]

out is a list of CTC loss values, one per example in the batch.

See Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks, A. Graves et al. for more information on the definition and the algorithm.

Defined in /home/smola/mxnet/src/operator/nn/ctc_loss.cc:L104

Parameters:

data (Symbol) – Input ndarray
label (Symbol) – Ground-truth labels for the loss.
data_lengths (Symbol) – Lengths of data for each of the samples. Only required when use_data_lengths is true.
label_lengths (Symbol) – Lengths of labels for each of the samples. Only required when use_label_lengths is true.
use_data_lengths (boolean, optional, default=0) – Whether the data lenghts are decided by data_lengths. If false, the lengths are equal to the max sequence length.
use_label_lengths (boolean, optional, default=0) – Whether the label lenghts are decided by label_lengths, or derived from padding_mask. If false, the lengths are derived from the first occurrence of the value of padding_mask. The value of padding_mask is 0 when first CTC label is reserved for blank, and -1 when last label is reserved for blank. See blank_label.
blank_label ({'first', 'last'},optional, default='first') – Set the label that is reserved for blank label.If “first”, 0-th label is reserved, and label values for tokens in the vocabulary are between 1 and alphabet_size-1, and the padding mask is -1. If “last”, last label value alphabet_size-1 is reserved for blank label instead, and label values for tokens in the vocabulary are between 0 and alphabet_size-2, and the padding mask is 0.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.deconvolution(data=None, weight=None, bias=None, kernel=_Null, stride=_Null, dilate=_Null, pad=_Null, adj=_Null, target_shape=_Null, num_filter=_Null, num_group=_Null, workspace=_Null, no_bias=_Null, cudnn_tune=_Null, cudnn_off=_Null, layout=_Null, name=None, attr=None, out=None, **kwargs)¶

Computes 1D, 2D or 3D transposed convolution (aka fractionally strided convolution) of the input tensor. This operation can be seen as the gradient of Convolution operation with respect to its input. Convolution usually reduces the size of the input. Transposed convolution works the other way, going from a smaller input to a larger output while preserving the connectivity pattern.

Parameters:

data (Symbol) – Input tensor to the deconvolution operation.
weight (Symbol) – Weights representing the kernel.
bias (Symbol) – Bias added to the result after the deconvolution operation.
kernel (Shape(tuple), required) – Deconvolution kernel size: (w,), (h, w) or (d, h, w). This is same as the kernel size used for the corresponding convolution
stride (Shape(tuple), optional, default=[]) – The stride used for the corresponding convolution: (w,), (h, w) or (d, h, w). Defaults to 1 for each dimension.
dilate (Shape(tuple), optional, default=[]) – Dilation factor for each dimension of the input: (w,), (h, w) or (d, h, w). Defaults to 1 for each dimension.
pad (Shape(tuple), optional, default=[]) – The amount of implicit zero padding added during convolution for each dimension of the input: (w,), (h, w) or (d, h, w). (kernel-1)/2 is usually a good choice. If target_shape is set, pad will be ignored and a padding that will generate the target shape will be used. Defaults to no padding.
adj (Shape(tuple), optional, default=[]) – Adjustment for output shape: (w,), (h, w) or (d, h, w). If target_shape is set, adj will be ignored and computed accordingly.
target_shape (Shape(tuple), optional, default=[]) – Shape of the output tensor: (w,), (h, w) or (d, h, w).
num_filter (int (non-negative), required) – Number of output filters.
num_group (int (non-negative), optional, default=1) – Number of groups partition.
workspace (long (non-negative), optional, default=1024) – Maximum temporary workspace allowed (MB) in deconvolution.This parameter has two usages. When CUDNN is not used, it determines the effective batch size of the deconvolution kernel. When CUDNN is used, it controls the maximum temporary storage used for tuning the best CUDNN kernel when limited_workspace strategy is used.
no_bias (boolean, optional, default=1) – Whether to disable bias parameter.
cudnn_tune ({None, 'fastest', 'limited_workspace', 'off'},optional, default='None') – Whether to pick convolution algorithm by running performance test.
cudnn_off (boolean, optional, default=0) – Turn off cudnn for this layer.
layout ({None, 'NCDHW', 'NCHW', 'NCW', 'NDHWC', 'NHWC'},optional, default='None') – Set layout for input, output and weight. Empty for default layout, NCW for 1d, NCHW for 2d and NCDHW for 3d.NHWC and NDHWC are only supported on GPU.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.deformable_convolution(data=None, offset=None, weight=None, bias=None, kernel=_Null, stride=_Null, dilate=_Null, pad=_Null, num_filter=_Null, num_group=_Null, num_deformable_group=_Null, workspace=_Null, no_bias=_Null, layout=_Null, name=None, attr=None, out=None, **kwargs)¶

Compute 2-D deformable convolution on 4-D input.

The deformable convolution operation is described in https://arxiv.org/abs/1703.06211

For 2-D deformable convolution, the shapes are

data: (batch_size, channel, height, width)
offset: (batch_size, num_deformable_group * kernel[0] * kernel[1] * 2, height, width)
weight: (num_filter, channel, kernel[0], kernel[1])
bias: (num_filter,)
out: (batch_size, num_filter, out_height, out_width).

Define:

f(x,k,p,s,d) = floor((x+2*p-d*(k-1)-1)/s)+1

then we have:

out_height=f(height, kernel[0], pad[0], stride[0], dilate[0])
out_width=f(width, kernel[1], pad[1], stride[1], dilate[1])

If no_bias is set to be true, then the bias term is ignored.

The default data layout is NCHW, namely (batch_size, channle, height, width).

If num_deformable_group is larger than 1, denoted by dg, then split the input offset evenly into dg parts along the channel axis, and also evenly split data into dg parts along the channel axis. Next compute the deformable convolution, apply the i-th part of the offset on the i-th part of the data.

Both weight and bias are learnable parameters.

Defined in /home/smola/mxnet/src/operator/deformable_convolution.cc:L80

Parameters:

data (Symbol) – Input data to the DeformableConvolutionOp.
offset (Symbol) – Input offset to the DeformableConvolutionOp.
weight (Symbol) – Weight matrix.
bias (Symbol) – Bias parameter.
kernel (Shape(tuple), required) – Convolution kernel size: (h, w) or (d, h, w)
stride (Shape(tuple), optional, default=[]) – Convolution stride: (h, w) or (d, h, w). Defaults to 1 for each dimension.
dilate (Shape(tuple), optional, default=[]) – Convolution dilate: (h, w) or (d, h, w). Defaults to 1 for each dimension.
pad (Shape(tuple), optional, default=[]) – Zero pad for convolution: (h, w) or (d, h, w). Defaults to no padding.
num_filter (long, required) – Convolution filter(channel) number
num_group (long, optional, default=1) – Number of group partitions.
num_deformable_group (long, optional, default=1) – Number of deformable group partitions.
workspace (long (non-negative), optional, default=1024) – Maximum temperal workspace allowed for convolution (MB).
no_bias (boolean, optional, default=0) – Whether to disable bias parameter.
layout ({None, 'NCDHW', 'NCHW', 'NCW'},optional, default='None') –

Set layout for input, output and weight. Empty for
default layout: NCW for 1d, NCHW for 2d and NCDHW for 3d.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.digamma(data=None, name=None, attr=None, out=None, **kwargs)¶

Returns element-wise log derivative of the gamma function of the input.

The storage type of digamma output is always dense

Parameters:

data (Symbol) – The input array.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.dropout(data=None, p=_Null, mode=_Null, axes=_Null, cudnn_off=_Null, name=None, attr=None, out=None, **kwargs)¶

Applies dropout operation to input array.

During training, each element of the input is set to zero with probability p. The whole array is rescaled by \(1/(1-p)\) to keep the expected sum of the input unchanged.
During testing, this operator does not change the input if mode is ‘training’. If mode is ‘always’, the same computaion as during training will be applied.

Example:

random.seed(998)
input_array = array([[3., 0.5,  -0.5,  2., 7.],
                    [2., -0.4,   7.,  3., 0.2]])
a = symbol.Variable('a')
dropout = symbol.Dropout(a, p = 0.2)
executor = dropout.simple_bind(a = input_array.shape)

## If training
executor.forward(is_train = True, a = input_array)
executor.outputs
[[ 3.75   0.625 -0.     2.5    8.75 ]
 [ 2.5   -0.5    8.75   3.75   0.   ]]

## If testing
executor.forward(is_train = False, a = input_array)
executor.outputs
[[ 3.     0.5   -0.5    2.     7.   ]
 [ 2.    -0.4    7.     3.     0.2  ]]

Defined in /home/smola/mxnet/src/operator/nn/dropout.cc:L95

Parameters:

data (Symbol) – Input array to which dropout will be applied.
p (float, optional, default=0.5) – Fraction of the input that gets dropped out during training time.
mode ({'always', 'training'},optional, default='training') – Whether to only turn on dropout during training or to also turn on for inference.
axes (Shape(tuple), optional, default=[]) – Axes for variational dropout kernel.
cudnn_off (boolean or None, optional, default=0) – Whether to turn off cudnn in dropout operator. This option is ignored if axes is specified.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.embedding(data=None, weight=None, input_dim=_Null, output_dim=_Null, dtype=_Null, sparse_grad=_Null, name=None, attr=None, out=None, **kwargs)¶

Maps integer indices to vector representations (embeddings).

This operator maps words to real-valued vectors in a high-dimensional space, called word embeddings. These embeddings can capture semantic and syntactic properties of the words. For example, it has been noted that in the learned embedding spaces, similar words tend to be close to each other and dissimilar words far apart.

For an input array of shape (d1, …, dK), the shape of an output array is (d1, …, dK, output_dim). All the input values should be integers in the range [0, input_dim).

If the input_dim is ip0 and output_dim is op0, then shape of the embedding weight matrix must be (ip0, op0).

When “sparse_grad” is False, if any index mentioned is too large, it is replaced by the index that addresses the last vector in an embedding matrix. When “sparse_grad” is True, an error will be raised if invalid indices are found.

Examples:

input_dim = 4
output_dim = 5

// Each row in weight matrix y represents a word. So, y = (w0,w1,w2,w3)
y = [[  0.,   1.,   2.,   3.,   4.],
     [  5.,   6.,   7.,   8.,   9.],
     [ 10.,  11.,  12.,  13.,  14.],
     [ 15.,  16.,  17.,  18.,  19.]]

// Input array x represents n-grams(2-gram). So, x = [(w1,w3), (w0,w2)]
x = [[ 1.,  3.],
     [ 0.,  2.]]

// Mapped input x to its vector representation y.
Embedding(x, y, 4, 5) = [[[  5.,   6.,   7.,   8.,   9.],
                          [ 15.,  16.,  17.,  18.,  19.]],

                         [[  0.,   1.,   2.,   3.,   4.],
                          [ 10.,  11.,  12.,  13.,  14.]]]

The storage type of weight can be either row_sparse or default.

Note

If “sparse_grad” is set to True, the storage type of gradient w.r.t weights will be “row_sparse”. Only a subset of optimizers support sparse gradients, including SGD, AdaGrad and Adam. Note that by default lazy updates is turned on, which may perform differently from standard updates. For more details, please check the Optimization API at: https://mxnet.apache.org/versions/master/api/python/docs/api/optimizer/index.html

Defined in /home/smola/mxnet/src/operator/tensor/indexing_op.cc:L758

Parameters:

data (Symbol) – The input array to the embedding operator.
weight (Symbol) – The embedding weight matrix.
input_dim (long, required) – Vocabulary size of the input indices.
output_dim (long, required) – Dimension of the embedding vectors.
dtype ({'bfloat16', 'float16', 'float32', 'float64', 'int32', 'int64', 'int8', 'uint8'},optional, default='float32') – Data type of weight.
sparse_grad (boolean, optional, default=0) – Compute row sparse gradient in the backward calculation. If set to True, the grad’s storage type is row_sparse.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.erf(data=None, name=None, attr=None, out=None, **kwargs)¶

Returns element-wise gauss error function of the input.

Example:

erf([0, -1., 10.]) = [0., -0.8427, 1.]

Defined in /home/smola/mxnet/src/operator/tensor/elemwise_unary_op_basic.cc:L1015

Parameters:

data (Symbol) – The input array.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.erfinv(data=None, name=None, attr=None, out=None, **kwargs)¶

Returns element-wise inverse gauss error function of the input.

Example:

erfinv([0, 0.5., -1.]) = [0., 0.4769, -inf]

Defined in /home/smola/mxnet/src/operator/tensor/elemwise_unary_op_basic.cc:L1036

Parameters:

data (Symbol) – The input array.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.foreach(*data, **kwargs)¶

Run a for loop over an ndarray with user-defined computation

From:/home/smola/mxnet/src/operator/npx_control_flow.cc:1151 This function support variable length of positional input.

Parameters:

fn (Symbol) – Input graph.
data (Symbol[]) – The input arrays that include data arrays and states.
num_outputs (int, required) – The number of outputs of the subgraph.
num_out_data (int, required) – The number of output data of the subgraph.
in_state_locs (tuple of <long>, required) – The locations of loop states among the inputs.
in_data_locs (tuple of <long>, required) – The locations of input data among the inputs.
remain_locs (tuple of <long>, required) – The locations of remaining data among the inputs.
in_state_index (tuple of <long>, required) – The index mapping from out_states to in_states.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.fully_connected(data=None, weight=None, bias=None, num_hidden=_Null, no_bias=_Null, flatten=_Null, name=None, attr=None, out=None, **kwargs)¶

Applies a linear transformation: \(Y = XW^T + b\).

If flatten is set to be true, then the shapes are:

data: (batch_size, x1, x2, …, xn)
weight: (num_hidden, x1 * x2 * … * xn)
bias: (num_hidden,)
out: (batch_size, num_hidden)

If flatten is set to be false, then the shapes are:

data: (x1, x2, …, xn, input_dim)
weight: (num_hidden, input_dim)
bias: (num_hidden,)
out: (x1, x2, …, xn, num_hidden)

The learnable parameters include both weight and bias.

If no_bias is set to be true, then the bias term is ignored.

Note

The sparse support for FullyConnected is limited to forward evaluation with row_sparse weight and bias, where the length of weight.indices and bias.indices must be equal to num_hidden. This could be useful for model inference with row_sparse weights trained with importance sampling or noise contrastive estimation.

To compute linear transformation with ‘csr’ sparse data, sparse.dot is recommended instead of sparse.FullyConnected.

Defined in /home/smola/mxnet/src/operator/nn/fully_connected.cc:L288

Parameters:

data (Symbol) – Input data.
weight (Symbol) – Weight matrix.
bias (Symbol) – Bias parameter.
num_hidden (int, required) – Number of hidden nodes of the output.
no_bias (boolean, optional, default=0) – Whether to disable bias parameter.
flatten (boolean, optional, default=1) – Whether to collapse all but the first axis of the input data tensor.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.gamma(data=None, name=None, attr=None, out=None, **kwargs)¶

Returns the gamma function (extension of the factorial function to the reals), computed element-wise on the input array.

The storage type of gamma output is always dense

Parameters:

data (Symbol) – The input array.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.gammaln(data=None, name=None, attr=None, out=None, **kwargs)¶

Returns element-wise log of the absolute value of the gamma function of the input.

The storage type of gammaln output is always dense

Parameters:

data (Symbol) – The input array.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.gather_nd(data=None, indices=None, name=None, attr=None, out=None, **kwargs)¶

Gather elements or slices from data and store to a tensor whose shape is defined by indices.

Given data with shape (X_0, X_1, …, X_{N-1}) and indices with shape (M, Y_0, …, Y_{K-1}), the output will have shape (Y_0, …, Y_{K-1}, X_M, …, X_{N-1}), where M <= N. If M == N, output shape will simply be (Y_0, …, Y_{K-1}).

The elements in output is defined as follows:

output[y_0, ..., y_{K-1}, x_M, ..., x_{N-1}] = data[indices[0, y_0, ..., y_{K-1}],
                                                    ...,
                                                    indices[M-1, y_0, ..., y_{K-1}],
                                                    x_M, ..., x_{N-1}]

Examples:

data = [[0, 1], [2, 3]]
indices = [[1, 1, 0], [0, 1, 0]]
gather_nd(data, indices) = [2, 3, 0]

data = [[[1, 2], [3, 4]], [[5, 6], [7, 8]]]
indices = [[0, 1], [1, 0]]
gather_nd(data, indices) = [[3, 4], [5, 6]]

Parameters:

data (Symbol) – data
indices (Symbol) – indices
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.group_norm(data=None, gamma=None, beta=None, num_groups=_Null, eps=_Null, output_mean_var=_Null, name=None, attr=None, out=None, **kwargs)¶

Group normalization.

The input channels are separated into num_groups groups, each containing num_channels / num_groups channels. The mean and standard-deviation are calculated separately over the each group.

\[data = data.reshape((N, num_groups, C // num_groups, ...)) out = \frac{data - mean(data, axis)}{\sqrt{var(data, axis) + \epsilon}} * gamma + beta\]

Both gamma and beta are learnable parameters.

Defined in /home/smola/mxnet/src/operator/nn/group_norm.cc:L77

Parameters:

data (Symbol) – Input data
gamma (Symbol) – gamma array
beta (Symbol) – beta array
num_groups (int, optional, default='1') – Total number of groups.
eps (float, optional, default=9.99999975e-06) – An epsilon parameter to prevent division by 0.
output_mean_var (boolean, optional, default=0) – Output the mean and std calculated along the given axis.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.index_add(a=None, ind=None, val=None, name=None, attr=None, out=None, **kwargs)¶

Add values to input according to given indexes. If exists repeate positions to be updated, the update value will be accumulated.

Parameters:

a (ndarray) – Input data. The array to be updated.
ind (ndarray) –
Indexes for indicating update positions. For example, array([[0, 1], [2, 3], [4, 5]] indicates here are two positions to be updated, which is (0, 2, 4) and (1, 3, 5). Note: - ‘ind’ cannot be empty array ‘[]’, for that case, please use operator ‘add’ instead.
- 0 <= ind.ndim <= 2.
- ind.dtype should be ‘int32’ or ‘int64’
val (ndarray) – Input data. The array to update the input ‘a’.

Returns:

out – The output array.

Return type:

ndarray

Examples

>>> a = np.zeros((2, 3, 4))
>>> ind = np.array([[0, 0], [0, 0], [0, 1]], dtype='int32')
>>> val = np.arange(2).reshape(2) + 1
>>> b = npx.index_add(a, ind, val)
>>> b
array([[[1., 2., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],

>>> ind = np.array([[0, 0], [0, 0], [0, 0]], dtype='int32')  # accumulate values in repeated positions
>>> b = npx.index_add(a, ind, val)
>>> b
array([[[3., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],

>>> ind=np.array([[0, 0], [0, 1]], dtype='int32')
>>> val = np.arange(8).reshape(2, 4)
>>> b = npx.index_add(a, ind, val)
>>> b
array([[[0., 1., 2., 3.],
        [4., 5., 6., 7.],
        [0., 0., 0., 0.]],

>>> val = np.arange(4).reshape(4)  # brocast 'val'
>>> b = npx.index_add(a, ind, val)
>>> b
array([[[0., 1., 2., 3.],
        [0., 1., 2., 3.],
        [0., 0., 0., 0.]],

mxnet.symbol.numpy_extension.index_update(a=None, ind=None, val=None, name=None, attr=None, out=None, **kwargs)¶

Update values to input according to given indexes. If multiple indices refer to the same location it is undefined which update is chosen; it may choose the order of updates arbitrarily and nondeterministically (e.g., due to concurrent updates on some hardware platforms). Recommend not to use repeate positions.

Parameters:

a (ndarray) – Input data. The array to be updated. Support dtype: ‘float32’, ‘float64’, ‘int32’, ‘int64’.
ind (ndarray) –
Indexes for indicating update positions. For example, array([[0, 1], [2, 3], [4, 5]] indicates here are two positions to be updated, which is (0, 2, 4) and (1, 3, 5). Note: - ‘ind’ cannot be empty array ‘[]’, for that case, please use operator ‘add’ instead.
- 0 <= ind.ndim <= 2.
- ind.dtype should be ‘int32’ or ‘int64’
val (ndarray) – Input data. The array to update the input ‘a’. Support dtype: ‘float32’, ‘float64’, ‘int32’, ‘int64’.

Returns:

out – The output array.

Return type:

ndarray

Examples

>>> a = np.zeros((2, 3, 4))
>>> ind = np.array([[0, 0], [0, 0], [0, 1]], dtype='int32')
>>> val = np.arange(2).reshape(2) + 1
>>> b = npx.index_update(a, ind, val)
>>> b
array([[[1., 2., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],

>>> ind=np.array([[0, 0], [0, 1]], dtype='int32')
>>> val = np.arange(8).reshape(2, 4)
>>> b = npx.index_update(a, ind, val)
>>> b
array([[[0., 1., 2., 3.],
        [4., 5., 6., 7.],
        [0., 0., 0., 0.]],

>>> val = np.arange(4).reshape(4)  # brocast 'val'
>>> b = npx.index_update(a, ind, val)
>>> b
array([[[0., 1., 2., 3.],
        [0., 1., 2., 3.],
        [0., 0., 0., 0.]],

mxnet.symbol.numpy_extension.instance_norm(data=None, gamma=None, beta=None, eps=_Null, name=None, attr=None, out=None, **kwargs)¶

Applies instance normalization to the n-dimensional input array.

This operator takes an n-dimensional input array where (n>2) and normalizes the input using the following formula:

\[out = \frac{x - mean[data]}{ \sqrt{Var[data] + \epsilon}} * gamma + beta\]

This layer is similar to batch normalization layer (BatchNorm) with two differences: first, the normalization is carried out per example (instance), not over a batch. Second, the same normalization is applied both at test and train time. This operation is also known as contrast normalization.

If the input data is of shape [batch, channel, spacial_dim1, spacial_dim2, …], gamma and beta parameters must be vectors of shape [channel].

This implementation is based on this paper [1]_

Examples:

// Input of shape (2,1,2)
x = [[[ 1.1,  2.2]],
     [[ 3.3,  4.4]]]

// gamma parameter of length 1
gamma = [1.5]

// beta parameter of length 1
beta = [0.5]

// Instance normalization is calculated with the above formula
InstanceNorm(x,gamma,beta) = [[[-0.997527  ,  1.99752665]],
                              [[-0.99752653,  1.99752724]]]

Defined in /home/smola/mxnet/src/operator/instance_norm.cc:L94

Parameters:

data (Symbol) – An n-dimensional input array (n > 2) of the form [batch, channel, spatial_dim1, spatial_dim2, …].
gamma (Symbol) – A vector of length ‘channel’, which multiplies the normalized input.
beta (Symbol) – A vector of length ‘channel’, which is added to the product of the normalized input and the weight.
eps (float, optional, default=0.00100000005) – An epsilon parameter to prevent division by 0.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.interleaved_matmul_encdec_qk(queries=None, keys_values=None, heads=_Null, name=None, attr=None, out=None, **kwargs)¶

Compute the matrix multiplication between the projections of queries and keys in multihead attention use as encoder-decoder.

the inputs must be a tensor of projections of queries following the layout: (seq_length, batch_size, num_heads * head_dim)

and a tensor of interleaved projections of values and keys following the layout: (seq_length, batch_size, num_heads * head_dim * 2)

the equivalent code would be:

q_proj = mx.nd.transpose(queries, axes=(1, 2, 0, 3))
q_proj = mx.nd.reshape(q_proj, shape=(-1, 0, 0), reverse=True)
q_proj = mx.nd.contrib.div_sqrt_dim(q_proj)
tmp = mx.nd.reshape(keys_values, shape=(0, 0, num_heads, 2, -1))
k_proj = mx.nd.transpose(tmp[:,:,:,0,:], axes=(1, 2, 0, 3))
k_proj = mx.nd.reshap(k_proj, shape=(-1, 0, 0), reverse=True)
output = mx.nd.batch_dot(q_proj, k_proj, transpose_b=True)

Defined in /home/smola/mxnet/src/operator/contrib/transformer.cc:L797

Parameters:

queries (Symbol) – Queries
keys_values (Symbol) – Keys and values interleaved
heads (int, required) – Set number of heads
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.interleaved_matmul_encdec_valatt(keys_values=None, attention=None, heads=_Null, name=None, attr=None, out=None, **kwargs)¶

Compute the matrix multiplication between the projections of values and the attention weights in multihead attention use as encoder-decoder.

the inputs must be a tensor of interleaved projections of keys and values following the layout: (seq_length, batch_size, num_heads * head_dim * 2)

and the attention weights following the layout: (batch_size, seq_length, seq_length)

the equivalent code would be:

tmp = mx.nd.reshape(queries_keys_values, shape=(0, 0, num_heads, 3, -1))
v_proj = mx.nd.transpose(tmp[:,:,:,1,:], axes=(1, 2, 0, 3))
v_proj = mx.nd.reshape(v_proj, shape=(-1, 0, 0), reverse=True)
output = mx.nd.batch_dot(attention, v_proj, transpose_b=True)
output = mx.nd.reshape(output, shape=(-1, num_heads, 0, 0), reverse=True)
output = mx.nd.transpose(output, axes=(0, 2, 1, 3))
output = mx.nd.reshape(output, shape=(0, 0, -1))

Defined in /home/smola/mxnet/src/operator/contrib/transformer.cc:L847

Parameters:

keys_values (Symbol) – Keys and values interleaved
attention (Symbol) – Attention maps
heads (int, required) – Set number of heads
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.interleaved_matmul_selfatt_qk(queries_keys_values=None, heads=_Null, name=None, attr=None, out=None, **kwargs)¶

Compute the matrix multiplication between the projections of queries and keys in multihead attention use as self attention.

the input must be a single tensor of interleaved projections of queries, keys and values following the layout: (seq_length, batch_size, num_heads * head_dim * 3)

the equivalent code would be:

tmp = mx.nd.reshape(queries_keys_values, shape=(0, 0, num_heads, 3, -1))
q_proj = mx.nd.transpose(tmp[:,:,:,0,:], axes=(1, 2, 0, 3))
q_proj = mx.nd.reshape(q_proj, shape=(-1, 0, 0), reverse=True)
q_proj = mx.nd.contrib.div_sqrt_dim(q_proj)
k_proj = mx.nd.transpose(tmp[:,:,:,1,:], axes=(1, 2, 0, 3))
k_proj = mx.nd.reshape(k_proj, shape=(-1, 0, 0), reverse=True)
output = mx.nd.batch_dot(q_proj, k_proj, transpose_b=True)

Defined in /home/smola/mxnet/src/operator/contrib/transformer.cc:L694

Parameters:

queries_keys_values (Symbol) – Interleaved queries, keys and values
heads (int, required) – Set number of heads
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.interleaved_matmul_selfatt_valatt(queries_keys_values=None, attention=None, heads=_Null, name=None, attr=None, out=None, **kwargs)¶

Compute the matrix multiplication between the projections of values and the attention weights in multihead attention use as self attention.

the inputs must be a tensor of interleaved projections of queries, keys and values following the layout: (seq_length, batch_size, num_heads * head_dim * 3)

and the attention weights following the layout: (batch_size, seq_length, seq_length)

the equivalent code would be:

tmp = mx.nd.reshape(queries_keys_values, shape=(0, 0, num_heads, 3, -1))
v_proj = mx.nd.transpose(tmp[:,:,:,2,:], axes=(1, 2, 0, 3))
v_proj = mx.nd.reshape(v_proj, shape=(-1, 0, 0), reverse=True)
output = mx.nd.batch_dot(attention, v_proj)
output = mx.nd.reshape(output, shape=(-1, num_heads, 0, 0), reverse=True)
output = mx.nd.transpose(output, axes=(2, 0, 1, 3))
output = mx.nd.reshape(output, shape=(0, 0, -1))

Defined in /home/smola/mxnet/src/operator/contrib/transformer.cc:L745

Parameters:

queries_keys_values (Symbol) – Queries, keys and values interleaved
attention (Symbol) – Attention maps
heads (int, required) – Set number of heads
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.intgemm_fully_connected(data=None, weight=None, scaling=None, bias=None, num_hidden=_Null, no_bias=_Null, flatten=_Null, out_type=_Null, name=None, attr=None, out=None, **kwargs)¶

Multiply matrices using 8-bit integers. data * weight.

Input tensor arguments are: data weight [scaling] [bias]

data: either float32 or prepared using intgemm_prepare_data (in which case it is int8).

weight: must be prepared using intgemm_prepare_weight.

scaling: present if and only if out_type is float32. If so this is multiplied by the result before adding bias. Typically: scaling = (max passed to intgemm_prepare_weight)/127.0 if data is in float32 scaling = (max_passed to intgemm_prepare_data)/127.0 * (max passed to intgemm_prepare_weight)/127.0 if data is in int8

bias: present if and only if !no_bias. This is added to the output after scaling and has the same number of columns as the output.

out_type: type of the output.

Defined in /home/smola/mxnet/src/operator/contrib/intgemm/intgemm_fully_connected_op.cc:L284

Parameters:

data (Symbol) – First argument to multiplication. Tensor of float32 (quantized on the fly) or int8 from intgemm_prepare_data. If you use a different quantizer, be sure to ban -128. The last dimension must be a multiple of 64.
weight (Symbol) – Second argument to multiplication. Tensor of int8 from intgemm_prepare_weight. The last dimension must be a multiple of 64. The product of non-last dimensions must be a multiple of 8.
scaling (Symbol) – Scaling factor to apply if output type is float32.
bias (Symbol) – Bias term.
num_hidden (int, required) – Number of hidden nodes of the output.
no_bias (boolean, optional, default=0) – Whether to disable bias parameter.
flatten (boolean, optional, default=1) – Whether to collapse all but the first axis of the input data tensor.
out_type ({'float32', 'int32'},optional, default='float32') – Output data type.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.intgemm_maxabsolute(data=None, name=None, attr=None, out=None, **kwargs)¶

Compute the maximum absolute value in a tensor of float32 fast on a CPU. The tensor’s total size must be a multiple of 16 and aligned to a multiple of 64 bytes. mxnet.nd.contrib.intgemm_maxabsolute(arr) == arr.abs().max()

Defined in /home/smola/mxnet/src/operator/contrib/intgemm/max_absolute_op.cc:L102

Parameters:

data (Symbol) – Tensor to compute maximum absolute value of
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.intgemm_prepare_data(data=None, maxabs=None, name=None, attr=None, out=None, **kwargs)¶

This operator converts quantizes float32 to int8 while also banning -128.

It it suitable for preparing an data matrix for use by intgemm’s C=data * weights operation.

The float32 values are scaled such that maxabs maps to 127. Typically maxabs = maxabsolute(A).

Defined in /home/smola/mxnet/src/operator/contrib/intgemm/prepare_data_op.cc:L112

Parameters:

data (Symbol) – Activation matrix to be prepared for multiplication.
maxabs (Symbol) – Maximum absolute value to be used for scaling. (The values will be multiplied by 127.0 / maxabs.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.intgemm_prepare_weight(weight=None, maxabs=None, already_quantized=_Null, name=None, attr=None, out=None, **kwargs)¶

This operator converts a weight matrix in column-major format to intgemm’s internal fast representation of weight matrices. MXNet customarily stores weight matrices in column-major (transposed) format. This operator is not meant to be fast; it is meant to be run offline to quantize a model.

In other words, it prepares weight for the operation C = data * weight^T.

If the provided weight matrix is float32, it will be quantized first. The quantization function is (int8_t)(127.0 / max * weight) where multiplier is provided as argument 1 (the weight matrix is argument 0). Then the matrix will be rearranged into the CPU-dependent format.

If the provided weight matrix is already int8, the matrix will only be rearranged into the CPU-dependent format. This way one can quantize with intgemm_prepare_data (which just quantizes), store to disk in a consistent format, then at load time convert to CPU-dependent format with intgemm_prepare_weight.

The internal representation depends on register length. So AVX512, AVX2, and SSSE3 have different formats. AVX512BW and AVX512VNNI have the same representation.

Defined in /home/smola/mxnet/src/operator/contrib/intgemm/prepare_weight_op.cc:L152

Parameters:

weight (Symbol) – Parameter matrix to be prepared for multiplication.
maxabs (Symbol) – Maximum absolute value for scaling. The weights will be multipled by 127.0 / maxabs.
already_quantized (boolean, optional, default=0) – Is the weight matrix already quantized?
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.intgemm_take_weight(weight=None, indices=None, name=None, attr=None, out=None, **kwargs)¶

Index a weight matrix stored in intgemm’s weight format. The indices select the outputs of matrix multiplication, not the inner dot product dimension.

Defined in /home/smola/mxnet/src/operator/contrib/intgemm/take_weight_op.cc:L125

Parameters:

weight (Symbol) – Tensor already in intgemm weight format to select from
indices (Symbol) – indices to select on the 0th dimension of weight
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.layer_norm(data=None, gamma=None, beta=None, axis=_Null, eps=_Null, output_mean_var=_Null, name=None, attr=None, out=None, **kwargs)¶

Layer normalization.

Normalizes the channels of the input tensor by mean and variance, and applies a scale gamma as well as offset beta.

Assume the input has more than one dimension and we normalize along axis 1. We first compute the mean and variance along this axis and then compute the normalized output, which has the same shape as input, as following:

\[out = \frac{data - mean(data, axis)}{\sqrt{var(data, axis) + \epsilon}} * gamma + beta\]

Both gamma and beta are learnable parameters.

Unlike BatchNorm and InstanceNorm, the mean and var are computed along the channel dimension.

Assume the input has size k on axis 1, then both gamma and beta have shape (k,). If output_mean_var is set to be true, then outputs both data_mean and data_std. Note that no gradient will be passed through these two outputs.

The parameter axis specifies which axis of the input shape denotes the ‘channel’ (separately normalized groups). The default is -1, which sets the channel axis to be the last item in the input shape.

Defined in /home/smola/mxnet/src/operator/nn/layer_norm.cc:L401

Parameters:

data (Symbol) – Input data to layer normalization
gamma (Symbol) – gamma array
beta (Symbol) – beta array
axis (int, optional, default='-1') – The axis to perform layer normalization. Usually, this should be be axis of the channel dimension. Negative values means indexing from right to left.
eps (float, optional, default=9.99999975e-06) – An epsilon parameter to prevent division by 0.
output_mean_var (boolean, optional, default=0) – Output the mean and std calculated along the given axis.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.leaky_relu(data=None, gamma=None, act_type=_Null, slope=_Null, lower_bound=_Null, upper_bound=_Null, name=None, attr=None, out=None, **kwargs)¶

Applies Leaky rectified linear unit activation element-wise to the input.

Leaky ReLUs attempt to fix the “dying ReLU” problem by allowing a small slope when the input is negative and has a slope of one when input is positive.

The following modified ReLU Activation functions are supported:

elu: Exponential Linear Unit. y = x > 0 ? x : slope * (exp(x)-1)
gelu: Gaussian Error Linear Unit. y = 0.5 * x * (1 + erf(x / sqrt(2)))
gelu_erf: Same as gelu.
gelu_tanh: Gaussian Error Linear Unit using tanh function. y = 0.5 * x * (1 + tanh((sqrt(2/pi) * (x + 0.044715*x^3))))
selu: Scaled Exponential Linear Unit. y = lambda * (x > 0 ? x : alpha * (exp(x) - 1)) where lambda = 1.0507009873554804934193349852946 and alpha = 1.6732632423543772848170429916717.
leaky: Leaky ReLU. y = x > 0 ? x : slope * x
prelu: Parametric ReLU. This is same as leaky except that slope is learnt during training.
rrelu: Randomized ReLU. same as leaky but the slope is uniformly and randomly chosen from [lower_bound, upper_bound) for training, while fixed to be (lower_bound+upper_bound)/2 for inference.

Defined in /home/smola/mxnet/src/operator/leaky_relu.cc:L196

Parameters:

data (Symbol) – Input data to activation function.
gamma (Symbol) – Input data to activation function.
act_type ({'elu', 'gelu_erf', 'gelu_tanh', 'leaky', 'prelu', 'rrelu', 'selu'},optional, default='leaky') – Activation function to be applied.
slope (float, optional, default=0.25) – Init slope for the activation. (For leaky and elu only)
lower_bound (float, optional, default=0.125) – Lower bound of random slope. (For rrelu only)
upper_bound (float, optional, default=0.333999991) – Upper bound of random slope. (For rrelu only)
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.log_softmax(data=None, axis=_Null, temperature=_Null, dtype=_Null, use_length=_Null, name=None, attr=None, out=None, **kwargs)¶

Computes the log softmax of the input. This is equivalent to computing softmax followed by log.

Examples:

>>> x = mx.nd.array([1, 2, .1])
>>> mx.nd.log_softmax(x).asnumpy()
array([-1.41702998, -0.41702995, -2.31702995], dtype=float32)

>>> x = mx.nd.array( [[1, 2, .1],[.1, 2, 1]] )
>>> mx.nd.log_softmax(x, axis=0).asnumpy()
array([[-0.34115392, -0.69314718, -1.24115396],
       [-1.24115396, -0.69314718, -0.34115392]], dtype=float32)

Parameters:

data (Symbol) – The input array.
axis (int, optional, default='-1') – The axis along which to compute softmax.
temperature (double or None, optional, default=None) – Temperature parameter in softmax
dtype ({None, 'float16', 'float32', 'float64'},optional, default='None') – DType of the output in case this can’t be inferred. Defaults to the same as input’s dtype if not defined (dtype=None).
use_length (boolean or None, optional, default=0) – Whether to use the length input as a mask over the data input.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.masked_log_softmax(data=None, mask=None, axis=_Null, temperature=_Null, normalize=_Null, name=None, attr=None, out=None, **kwargs)¶

Computes the masked log softmax of the input. This is equivalent to computing masked softmax followed by log.

Parameters:

data (Symbol) – The input array.
mask (Symbol) – Mask to apply.
axis (int, optional, default='-1') – The axis along which to compute softmax.
temperature (double or None, optional, default=None) – Temperature parameter in softmax
normalize (boolean or None, optional, default=1) – Whether to normalize input data x: x = x - max(x)
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.masked_softmax(data=None, mask=None, axis=_Null, temperature=_Null, normalize=_Null, name=None, attr=None, out=None, **kwargs)¶

Applies the softmax function masking elements according to the mask provided

Defined in /home/smola/mxnet/src/operator/nn/masked_softmax.cc:L74

Parameters:

data (Symbol) – The input array.
mask (Symbol) – Mask to apply.
axis (int, optional, default='-1') – The axis along which to compute softmax.
temperature (double or None, optional, default=None) – Temperature parameter in softmax
normalize (boolean or None, optional, default=1) – Whether to normalize input data x: x = x - max(x)
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.modulated_deformable_convolution(data=None, offset=None, mask=None, weight=None, bias=None, kernel=_Null, stride=_Null, dilate=_Null, pad=_Null, num_filter=_Null, num_group=_Null, num_deformable_group=_Null, workspace=_Null, no_bias=_Null, im2col_step=_Null, layout=_Null, name=None, attr=None, out=None, **kwargs)¶

Compute 2-D modulated deformable convolution on 4-D input.

The modulated deformable convolution operation is described in https://arxiv.org/abs/1811.11168

For 2-D modulated deformable convolution, the shapes are

data: (batch_size, channel, height, width)
offset: (batch_size, num_deformable_group * kernel[0] * kernel[1] * 2, height, width)
mask: (batch_size, num_deformable_group * kernel[0] * kernel[1], height, width)
weight: (num_filter, channel, kernel[0], kernel[1])
bias: (num_filter,)
out: (batch_size, num_filter, out_height, out_width).

Define:

f(x,k,p,s,d) = floor((x+2*p-d*(k-1)-1)/s)+1

then we have:

out_height=f(height, kernel[0], pad[0], stride[0], dilate[0])
out_width=f(width, kernel[1], pad[1], stride[1], dilate[1])

If no_bias is set to be true, then the bias term is ignored.

The default data layout is NCHW, namely (batch_size, channle, height, width).

If num_deformable_group is larger than 1, denoted by dg, then split the input offset evenly into dg parts along the channel axis, and also evenly split out evenly into dg parts along the channel axis. Next compute the deformable convolution, apply the i-th part of the offset part on the i-th out.

Both weight and bias are learnable parameters.

Defined in /home/smola/mxnet/src/operator/modulated_deformable_convolution.cc:L83

Parameters:

data (Symbol) – Input data to the ModulatedDeformableConvolutionOp.
offset (Symbol) – Input offset to ModulatedDeformableConvolutionOp.
mask (Symbol) – Input mask to the ModulatedDeformableConvolutionOp.
weight (Symbol) – Weight matrix.
bias (Symbol) – Bias parameter.
kernel (Shape(tuple), required) – Convolution kernel size: (h, w) or (d, h, w)
stride (Shape(tuple), optional, default=[]) – Convolution stride: (h, w) or (d, h, w). Defaults to 1 for each dimension.
dilate (Shape(tuple), optional, default=[]) – Convolution dilate: (h, w) or (d, h, w). Defaults to 1 for each dimension.
pad (Shape(tuple), optional, default=[]) – Zero pad for convolution: (h, w) or (d, h, w). Defaults to no padding.
num_filter (int (non-negative), required) – Convolution filter(channel) number
num_group (int (non-negative), optional, default=1) – Number of group partitions.
num_deformable_group (int (non-negative), optional, default=1) – Number of deformable group partitions.
workspace (long (non-negative), optional, default=1024) – Maximum temperal workspace allowed for convolution (MB).
no_bias (boolean, optional, default=0) – Whether to disable bias parameter.
im2col_step (int (non-negative), optional, default=64) – Maximum number of images per im2col computation; The total batch size should be divisable by this value or smaller than this value; if you face out of memory problem, you can try to use a smaller value here.
layout ({None, 'NCDHW', 'NCHW', 'NCW'},optional, default='None') –

Set layout for input, output and weight. Empty for
default layout: NCW for 1d, NCHW for 2d and NCDHW for 3d.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.multibox_detection(cls_prob=None, loc_pred=None, anchor=None, clip=_Null, threshold=_Null, background_id=_Null, nms_threshold=_Null, force_suppress=_Null, variances=_Null, nms_topk=_Null, name=None, attr=None, out=None, **kwargs)¶

Convert multibox detection predictions.

Parameters:

cls_prob (Symbol) – Class probabilities.
loc_pred (Symbol) – Location regression predictions.
anchor (Symbol) – Multibox prior anchor boxes
clip (boolean, optional, default=1) – Clip out-of-boundary boxes.
threshold (float, optional, default=0.00999999978) – Threshold to be a positive prediction.
background_id (int, optional, default='0') – Background id.
nms_threshold (float, optional, default=0.5) – Non-maximum suppression threshold.
force_suppress (boolean, optional, default=0) – Suppress all detections regardless of class_id.
variances (tuple of <float>, optional, default=[0.1,0.1,0.2,0.2]) – Variances to be decoded from box regression output.
nms_topk (int, optional, default='-1') – Keep maximum top k detections before nms, -1 for no limit.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.multibox_prior(data=None, sizes=_Null, ratios=_Null, clip=_Null, steps=_Null, offsets=_Null, name=None, attr=None, out=None, **kwargs)¶

Generate prior(anchor) boxes from data, sizes and ratios.

Parameters:

data (Symbol) – Input data.
sizes (tuple of <float>, optional, default=[1]) – List of sizes of generated MultiBoxPriores.
ratios (tuple of <float>, optional, default=[1]) – List of aspect ratios of generated MultiBoxPriores.
clip (boolean, optional, default=0) – Whether to clip out-of-boundary boxes.
steps (tuple of <float>, optional, default=[-1,-1]) – Priorbox step across y and x, -1 for auto calculation.
offsets (tuple of <float>, optional, default=[0.5,0.5]) – Priorbox center offsets, y and x respectively
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.multibox_target(anchor=None, label=None, cls_pred=None, overlap_threshold=_Null, ignore_label=_Null, negative_mining_ratio=_Null, negative_mining_thresh=_Null, minimum_negative_samples=_Null, variances=_Null, name=None, attr=None, out=None, **kwargs)¶

Compute Multibox training targets

Parameters:

anchor (Symbol) – Generated anchor boxes.
label (Symbol) – Object detection labels.
cls_pred (Symbol) – Class predictions.
overlap_threshold (float, optional, default=0.5) – Anchor-GT overlap threshold to be regarded as a positive match.
ignore_label (float, optional, default=-1) – Label for ignored anchors.
negative_mining_ratio (float, optional, default=-1) – Max negative to positive samples ratio, use -1 to disable mining
negative_mining_thresh (float, optional, default=0.5) – Threshold used for negative mining.
minimum_negative_samples (int, optional, default='0') – Minimum number of negative samples.
variances (tuple of <float>, optional, default=[0.1,0.1,0.2,0.2]) – Variances to be encoded in box regression target.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.nonzero(x=None, name=None, attr=None, out=None, **kwargs)¶

Return the indices of the elements that are non-zero.

Returns a ndarray with ndim is 2. Each row contains the indices of the non-zero elements. The values in a are always tested and returned in row-major, C-style order.

The result of this is always a 2-D array, with a row for each non-zero element.

Parameters:: a (array_like) – Input array.
Returns:: array – Indices of elements that are non-zero.
Return type:: ndarray

Notes

This function differs from the original numpy.nonzero in the following aspects:

Does not support python numeric.
The return value is same as numpy.transpose(numpy.nonzero(a)).

Examples

>>> x = np.array([[3, 0, 0], [0, 4, 0], [5, 6, 0]])
>>> x
array([[3, 0, 0],
       [0, 4, 0],
       [5, 6, 0]])
>>> npx.nonzero(x)
array([[0, 0],
       [1, 1],
       [2, 0],
       [2, 1]], dtype=int64)

>>> np.transpose(npx.nonzero(x))
array([[0, 1, 2, 2],
       [0, 1, 0, 1]], dtype=int64)

mxnet.symbol.numpy_extension.norm(data=None, ord=_Null, axis=_Null, out_dtype=_Null, keepdims=_Null, name=None, attr=None, out=None, **kwargs)¶

Computes the norm on an ndarray.

This operator computes the norm on an ndarray with the specified axis, depending on the value of the ord parameter. By default, it computes the L2 norm on the entire array. Currently only ord=2 supports sparse ndarrays.

Examples:

x = [[[1, 2],
      [3, 4]],
     [[2, 2],
      [5, 6]]]

norm(x, ord=2, axis=1) = [[3.1622777 4.472136 ]
                          [5.3851647 6.3245554]]

norm(x, ord=1, axis=1) = [[4., 6.],
                          [7., 8.]]

rsp = x.cast_storage('row_sparse')

norm(rsp) = [5.47722578]

csr = x.cast_storage('csr')

norm(csr) = [5.47722578]

Defined in /home/smola/mxnet/src/operator/tensor/broadcast_reduce_norm_value.cc:L88

Parameters:

data (Symbol) – The input
ord (int, optional, default='2') – Order of the norm. Currently ord=1 and ord=2 is supported.
axis (Shape or None, optional, default=None) –

The axis or axes along which to perform the reduction.
The default, axis=(), will compute over all elements into a scalar array with shape (1,). If axis is int, a reduction is performed on a particular axis. If axis is a 2-tuple, it specifies the axes that hold 2-D matrices, and the matrix norms of these matrices are computed.
out_dtype ({None, 'float16', 'float32', 'float64', 'int32', 'int64', 'int8'},optional, default='None') – The data type of the output.
keepdims (boolean, optional, default=0) – If this is set to True, the reduced axis is left in the result as dimension with size one.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.one_hot(indices=None, depth=_Null, on_value=_Null, off_value=_Null, dtype=_Null, name=None, attr=None, out=None, **kwargs)¶

Returns a one-hot array.

The locations represented by indices take value on_value, while all other locations take value off_value.

one_hot operation with indices of shape (i0, i1) and depth of d would result in an output array of shape (i0, i1, d) with:

output[i,j,:] = off_value
output[i,j,indices[i,j]] = on_value

Examples:

one_hot([1,0,2,0], 3) = [[ 0.  1.  0.]
                         [ 1.  0.  0.]
                         [ 0.  0.  1.]
                         [ 1.  0.  0.]]

one_hot([1,0,2,0], 3, on_value=8, off_value=1,
        dtype='int32') = [[1 8 1]
                          [8 1 1]
                          [1 1 8]
                          [8 1 1]]

one_hot([[1,0],[1,0],[2,0]], 3) = [[[ 0.  1.  0.]
                                    [ 1.  0.  0.]]

                                   [[ 0.  1.  0.]
                                    [ 1.  0.  0.]]

                                   [[ 0.  0.  1.]
                                    [ 1.  0.  0.]]]

Defined in /home/smola/mxnet/src/operator/tensor/indexing_op.cc:L969

Parameters:

indices (Symbol) – array of locations where to set on_value
depth (long, required) – Depth of the one hot dimension.
on_value (double, optional, default=1) – The value assigned to the locations represented by indices.
off_value (double, optional, default=0) – The value assigned to the locations not represented by indices.
dtype ({'bfloat16', 'float16', 'float32', 'float64', 'int32', 'int64', 'int8', 'uint8'},optional, default='float32') – DType of the output
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.pad(data=None, mode=_Null, pad_width=_Null, constant_value=_Null, name=None, attr=None, out=None, **kwargs)¶

Pads an input array with a constant or edge values of the array.

Note

Pad is deprecated. Use pad instead.

Note

Current implementation only supports 4D and 5D input arrays with padding applied only on axes 1, 2 and 3. Expects axes 4 and 5 in pad_width to be zero.

This operation pads an input array with either a constant_value or edge values along each axis of the input array. The amount of padding is specified by pad_width.

pad_width is a tuple of integer padding widths for each axis of the format (before_1, after_1, ... , before_N, after_N). The pad_width should be of length 2*N where N is the number of dimensions of the array.

For dimension N of the input array, before_N and after_N indicates how many values to add before and after the elements of the array along dimension N. The widths of the higher two dimensions before_1, after_1, before_2, after_2 must be 0.

Example:

x = [[[[  1.   2.   3.]
       [  4.   5.   6.]]

      [[  7.   8.   9.]
       [ 10.  11.  12.]]]


     [[[ 11.  12.  13.]
       [ 14.  15.  16.]]

      [[ 17.  18.  19.]
       [ 20.  21.  22.]]]]

pad(x,mode="edge", pad_width=(0,0,0,0,1,1,1,1)) =

      [[[[  1.   1.   2.   3.   3.]
         [  1.   1.   2.   3.   3.]
         [  4.   4.   5.   6.   6.]
         [  4.   4.   5.   6.   6.]]

        [[  7.   7.   8.   9.   9.]
         [  7.   7.   8.   9.   9.]
         [ 10.  10.  11.  12.  12.]
         [ 10.  10.  11.  12.  12.]]]


       [[[ 11.  11.  12.  13.  13.]
         [ 11.  11.  12.  13.  13.]
         [ 14.  14.  15.  16.  16.]
         [ 14.  14.  15.  16.  16.]]

        [[ 17.  17.  18.  19.  19.]
         [ 17.  17.  18.  19.  19.]
         [ 20.  20.  21.  22.  22.]
         [ 20.  20.  21.  22.  22.]]]]

pad(x, mode="constant", constant_value=0, pad_width=(0,0,0,0,1,1,1,1)) =

      [[[[  0.   0.   0.   0.   0.]
         [  0.   1.   2.   3.   0.]
         [  0.   4.   5.   6.   0.]
         [  0.   0.   0.   0.   0.]]

        [[  0.   0.   0.   0.   0.]
         [  0.   7.   8.   9.   0.]
         [  0.  10.  11.  12.   0.]
         [  0.   0.   0.   0.   0.]]]


       [[[  0.   0.   0.   0.   0.]
         [  0.  11.  12.  13.   0.]
         [  0.  14.  15.  16.   0.]
         [  0.   0.   0.   0.   0.]]

        [[  0.   0.   0.   0.   0.]
         [  0.  17.  18.  19.   0.]
         [  0.  20.  21.  22.   0.]
         [  0.   0.   0.   0.   0.]]]]

Defined in /home/smola/mxnet/src/operator/pad.cc:L772

Parameters:

data (Symbol) – An n-dimensional input array.
mode ({'constant', 'edge', 'reflect'}, required) – Padding type to use. “constant” pads with constant_value “edge” pads using the edge values of the input array “reflect” pads by reflecting values with respect to the edges.
pad_width (Shape(tuple), required) – Widths of the padding regions applied to the edges of each axis. It is a tuple of integer padding widths for each axis of the format (before_1, after_1, ... , before_N, after_N). It should be of length 2*N where N is the number of dimensions of the array.This is equivalent to pad_width in numpy.pad, but flattened.
constant_value (double, optional, default=0) – The value used for padding when mode is “constant”.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.pick(data=None, index=None, axis=_Null, keepdims=_Null, mode=_Null, name=None, attr=None, out=None, **kwargs)¶

Picks elements from an input array according to the input indices along the given axis.

Given an input array of shape (d0, d1) and indices of shape (i0,), the result will be an output array of shape (i0,) with:

output[i] = input[i, indices[i]]

By default, if any index mentioned is too large, it is replaced by the index that addresses the last element along an axis (the clip mode).

This function supports n-dimensional input and (n-1)-dimensional indices arrays.

Examples:

x = [[ 1.,  2.],
     [ 3.,  4.],
     [ 5.,  6.]]

// picks elements with specified indices along axis 0
pick(x, y=[0,1], 0) = [ 1.,  4.]

// picks elements with specified indices along axis 1
pick(x, y=[0,1,0], 1) = [ 1.,  4.,  5.]

// picks elements with specified indices along axis 1 using 'wrap' mode
// to place indicies that would normally be out of bounds
pick(x, y=[2,-1,-2], 1, mode='wrap') = [ 1.,  4.,  5.]

y = [[ 1.],
     [ 0.],
     [ 2.]]

// picks elements with specified indices along axis 1 and dims are maintained
pick(x, y, 1, keepdims=True) = [[ 2.],
                               [ 3.],
                               [ 6.]]

Defined in /home/smola/mxnet/src/operator/tensor/broadcast_reduce_op_index.cc:L151

Parameters:

data (Symbol) – The input array
index (Symbol) – The index array
axis (int or None, optional, default='-1') – int or None. The axis to picking the elements. Negative values means indexing from right to left. If is None, the elements in the index w.r.t the flattened input will be picked.
keepdims (boolean, optional, default=0) – If true, the axis where we pick the elements is left in the result as dimension with size one.
mode ({'clip', 'wrap'},optional, default='clip') – Specify how out-of-bound indices behave. Default is “clip”. “clip” means clip to the range. So, if all indices mentioned are too large, they are replaced by the index that addresses the last element along an axis. “wrap” means to wrap around.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.pooling(data=None, kernel=_Null, pool_type=_Null, global_pool=_Null, cudnn_off=_Null, pooling_convention=_Null, stride=_Null, pad=_Null, p_value=_Null, count_include_pad=_Null, layout=_Null, output_size=_Null, name=None, attr=None, out=None, **kwargs)¶

Performs pooling on the input.

The shapes for 1-D pooling are

data and out: (batch_size, channel, width) (NCW layout) or (batch_size, width, channel) (NWC layout),

The shapes for 2-D pooling are

data and out: (batch_size, channel, height, width) (NCHW layout) or (batch_size, height, width, channel) (NHWC layout),

out_height = f(height, kernel[0], pad[0], stride[0]) out_width = f(width, kernel[1], pad[1], stride[1])

The definition of f depends on pooling_convention, which has two options:

valid (default):
```
f(x, k, p, s) = floor((x+2*p-k)/s)+1
```
full, which is compatible with Caffe:
```
f(x, k, p, s) = ceil((x+2*p-k)/s)+1
```

When global_pool is set to be true, then global pooling is performed. It will reset kernel=(height, width) and set the appropiate padding to 0.

Three pooling options are supported by pool_type:

avg: average pooling
max: max pooling
sum: sum pooling
lp: Lp pooling

For 3-D pooling, an additional depth dimension is added before height. Namely the input data and output will have shape (batch_size, channel, depth, height, width) (NCDHW layout) or (batch_size, depth, height, width, channel) (NDHWC layout).

Notes on Lp pooling:

Lp pooling was first introduced by this paper: https://arxiv.org/pdf/1204.3968.pdf. L-1 pooling is simply sum pooling, while L-inf pooling is simply max pooling. We can see that Lp pooling stands between those two, in practice the most common value for p is 2.

For each window X, the mathematical expression for Lp pooling is:

\(f(X) = \sqrt[p]{\sum_{x}^{X} x^p}\)

Defined in /home/smola/mxnet/src/operator/nn/pooling.cc:L410

Parameters:

data (Symbol) – Input data to the pooling operator.
kernel (Shape(tuple), optional, default=[]) – Pooling kernel size: (y, x) or (d, y, x)
pool_type ({'avg', 'lp', 'max', 'sum'},optional, default='max') – Pooling type to be applied.
global_pool (boolean, optional, default=0) – Ignore kernel size, do global pooling based on current input feature map.
cudnn_off (boolean, optional, default=0) – Turn off cudnn pooling and use MXNet pooling operator.
pooling_convention ({'full', 'same', 'valid'},optional, default='valid') – Pooling convention to be applied.
stride (Shape(tuple), optional, default=[]) – Stride: for pooling (y, x) or (d, y, x). Defaults to 1 for each dimension.
pad (Shape(tuple), optional, default=[]) – Pad for pooling: (y, x) or (d, y, x). Defaults to no padding.
p_value (int or None, optional, default='None') – Value of p for Lp pooling, can be 1 or 2, required for Lp Pooling.
count_include_pad (boolean or None, optional, default=None) – Only used for AvgPool, specify whether to count padding elements for averagecalculation. For example, with a 5*5 kernel on a 3*3 corner of a image,the sum of the 9 valid elements will be divided by 25 if this is set to true,or it will be divided by 9 if this is set to false. Defaults to true.
layout ({None, 'NCDHW', 'NCHW', 'NCW', 'NDHWC', 'NHWC', 'NWC'},optional, default='None') –

Set layout for input and output. Empty for
default layout: NCW for 1d, NCHW for 2d and NCDHW for 3d.
output_size (Shape or None, optional, default=None) – Only used for Adaptive Pooling. int (output size) or a tuple of int for output (height, width).
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.quantized_act(data=None, min_data=None, max_data=None, act_type=_Null, name=None, attr=None, out=None, **kwargs)¶

Activation operator for input and output data type of int8. The input and output data comes with min and max thresholds for quantizing the float32 data into int8.

Note

This operator only supports forward propogation. DO NOT use it in training. This operator only supports relu

Defined in /home/smola/mxnet/src/operator/quantization/quantized_activation.cc:L96

Parameters:

data (Symbol) – Input data.
min_data (Symbol) – Minimum value of data.
max_data (Symbol) – Maximum value of data.
act_type ({'log_sigmoid', 'mish', 'relu', 'sigmoid', 'softrelu', 'softsign', 'tanh'}, required) – Activation function to be applied.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.quantized_conv(data=None, weight=None, bias=None, min_data=None, max_data=None, min_weight=None, max_weight=None, min_bias=None, max_bias=None, kernel=_Null, stride=_Null, dilate=_Null, pad=_Null, num_filter=_Null, num_group=_Null, workspace=_Null, no_bias=_Null, cudnn_tune=_Null, cudnn_off=_Null, layout=_Null, name=None, attr=None, out=None, **kwargs)¶

Convolution operator for input, weight and bias data type of int8, and accumulates in type int32 for the output. For each argument, two more arguments of type float32 must be provided representing the thresholds of quantizing argument from data type float32 to int8. The final outputs contain the convolution result in int32, and min and max thresholds representing the threholds for quantizing the float32 output into int32.

Note

This operator only supports forward propogation. DO NOT use it in training.

Defined in /home/smola/mxnet/src/operator/quantization/quantized_conv.cc:L189

Parameters:

data (Symbol) – Input data.
weight (Symbol) – weight.
bias (Symbol) – bias.
min_data (Symbol) – Minimum value of data.
max_data (Symbol) – Maximum value of data.
min_weight (Symbol) – Minimum value of weight.
max_weight (Symbol) – Maximum value of weight.
min_bias (Symbol) – Minimum value of bias.
max_bias (Symbol) – Maximum value of bias.
kernel (Shape(tuple), required) – Convolution kernel size: (w,), (h, w) or (d, h, w)
stride (Shape(tuple), optional, default=[]) – Convolution stride: (w,), (h, w) or (d, h, w). Defaults to 1 for each dimension.
dilate (Shape(tuple), optional, default=[]) – Convolution dilate: (w,), (h, w) or (d, h, w). Defaults to 1 for each dimension.
pad (Shape(tuple), optional, default=[]) – Zero pad for convolution: (w,), (h, w) or (d, h, w). Defaults to no padding.
num_filter (int (non-negative), required) – Convolution filter(channel) number
num_group (int (non-negative), optional, default=1) – Number of group partitions.
workspace (long (non-negative), optional, default=1024) – Maximum temporary workspace allowed (MB) in convolution.This parameter has two usages. When CUDNN is not used, it determines the effective batch size of the convolution kernel. When CUDNN is used, it controls the maximum temporary storage used for tuning the best CUDNN kernel when limited_workspace strategy is used.
no_bias (boolean, optional, default=0) – Whether to disable bias parameter.
cudnn_tune ({None, 'fastest', 'limited_workspace', 'off'},optional, default='None') – Whether to pick convolution algo by running performance test.
cudnn_off (boolean, optional, default=0) – Turn off cudnn for this layer.
layout ({None, 'NCDHW', 'NCHW', 'NCW', 'NDHWC', 'NHWC', 'NWC'},optional, default='None') –

Set layout for input, output and weight. Empty for
default layout: NCW for 1d, NCHW for 2d and NCDHW for 3d.NHWC and NDHWC are only supported on GPU.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.quantized_elemwise_add(lhs=None, rhs=None, lhs_min=None, lhs_max=None, rhs_min=None, rhs_max=None, min_calib_range=_Null, max_calib_range=_Null, name=None, attr=None, out=None, **kwargs)¶

elemwise_add operator for input dataA and input dataB data type of int8, and accumulates in type int32 for the output. For each argument, two more arguments of type float32 must be provided representing the thresholds of quantizing argument from data type float32 to int8. The final outputs contain result in int32, and min and max thresholds representing the threholds for quantizing the float32 output into int32.

Note

This operator only supports forward propogation. DO NOT use it in training.

Parameters:

min_calib_range (float or None, optional, default=None) – The minimum scalar value in the form of float32 obtained through calibration. If present, it will be used to requantize the int8 output data.
max_calib_range (float or None, optional, default=None) – The maximum scalar value in the form of float32 obtained through calibration. If present, it will be used to requantize the int8 output data.
lhs (Symbol) – first input
rhs (Symbol) – second input
lhs_min (Symbol) – 3rd input
lhs_max (Symbol) – 4th input
rhs_min (Symbol) – 5th input
rhs_max (Symbol) – 6th input
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.quantized_elemwise_mul(lhs=None, rhs=None, lhs_min=None, lhs_max=None, rhs_min=None, rhs_max=None, min_calib_range=_Null, max_calib_range=_Null, enable_float_output=_Null, name=None, attr=None, out=None, **kwargs)¶

Multiplies arguments int8 element-wise.

Defined in /home/smola/mxnet/src/operator/quantization/quantized_elemwise_mul.cc:L255

Parameters:

lhs (Symbol) – first input
rhs (Symbol) – second input
lhs_min (Symbol) – Minimum value of first input.
lhs_max (Symbol) – Maximum value of first input.
rhs_min (Symbol) – Minimum value of second input.
rhs_max (Symbol) – Maximum value of second input.
min_calib_range (float or None, optional, default=None) – The minimum scalar value in the form of float32 obtained through calibration. If present, it will be used to requantize the int8 output data.
max_calib_range (float or None, optional, default=None) – The maximum scalar value in the form of float32 obtained through calibration. If present, it will be used to requantize the int8 output data.
enable_float_output (boolean, optional, default=0) – Whether to enable float32 output
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.quantized_embedding(data=None, weight=None, min_weight=None, max_weight=None, input_dim=_Null, output_dim=_Null, dtype=_Null, sparse_grad=_Null, name=None, attr=None, out=None, **kwargs)¶

Maps integer indices to int8 vector representations (embeddings).

Defined in /home/smola/mxnet/src/operator/quantization/quantized_indexing_op.cc:L144

Parameters:

data (Symbol) – The input array to the embedding operator.
weight (Symbol) – The embedding weight matrix.
min_weight (Symbol) – Minimum value of data.
max_weight (Symbol) – Maximum value of data.
input_dim (long, required) – Vocabulary size of the input indices.
output_dim (long, required) – Dimension of the embedding vectors.
dtype ({'bfloat16', 'float16', 'float32', 'float64', 'int32', 'int64', 'int8', 'uint8'},optional, default='float32') – Data type of weight.
sparse_grad (boolean, optional, default=0) – Compute row sparse gradient in the backward calculation. If set to True, the grad’s storage type is row_sparse.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.quantized_flatten(data=None, min_data=None, max_data=None, name=None, attr=None, out=None, **kwargs)¶

Parameters:

data (Symbol) – A ndarray/symbol of type float32
min_data (Symbol) – The minimum scalar value possibly produced for the data
max_data (Symbol) – The maximum scalar value possibly produced for the data
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.quantized_fully_connected(data=None, weight=None, bias=None, min_data=None, max_data=None, min_weight=None, max_weight=None, min_bias=None, max_bias=None, num_hidden=_Null, no_bias=_Null, flatten=_Null, name=None, attr=None, out=None, **kwargs)¶

Fully Connected operator for input, weight and bias data type of int8, and accumulates in type int32 for the output. For each argument, two more arguments of type float32 must be provided representing the thresholds of quantizing argument from data type float32 to int8. The final outputs contain the convolution result in int32, and min and max thresholds representing the threholds for quantizing the float32 output into int32.

Note

This operator only supports forward propogation. DO NOT use it in training.

Defined in /home/smola/mxnet/src/operator/quantization/quantized_fully_connected.cc:L328

Parameters:

data (Symbol) – Input data.
weight (Symbol) – weight.
bias (Symbol) – bias.
min_data (Symbol) – Minimum value of data.
max_data (Symbol) – Maximum value of data.
min_weight (Symbol) – Minimum value of weight.
max_weight (Symbol) – Maximum value of weight.
min_bias (Symbol) – Minimum value of bias.
max_bias (Symbol) – Maximum value of bias.
num_hidden (int, required) – Number of hidden nodes of the output.
no_bias (boolean, optional, default=0) – Whether to disable bias parameter.
flatten (boolean, optional, default=1) – Whether to collapse all but the first axis of the input data tensor.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.quantized_npi_add(lhs=None, rhs=None, lhs_min=None, lhs_max=None, rhs_min=None, rhs_max=None, min_calib_range=_Null, max_calib_range=_Null, name=None, attr=None, out=None, **kwargs)¶

Note

This operator only supports forward propogation. DO NOT use it in training.

Parameters:

min_calib_range (float or None, optional, default=None) – The minimum scalar value in the form of float32 obtained through calibration. If present, it will be used to requantize the int8 output data.
max_calib_range (float or None, optional, default=None) – The maximum scalar value in the form of float32 obtained through calibration. If present, it will be used to requantize the int8 output data.
lhs (Symbol) – first input
rhs (Symbol) – second input
lhs_min (Symbol) – 3rd input
lhs_max (Symbol) – 4th input
rhs_min (Symbol) – 5th input
rhs_max (Symbol) – 6th input
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.quantized_pooling(data=None, min_data=None, max_data=None, kernel=_Null, pool_type=_Null, global_pool=_Null, cudnn_off=_Null, pooling_convention=_Null, stride=_Null, pad=_Null, p_value=_Null, count_include_pad=_Null, layout=_Null, output_size=_Null, name=None, attr=None, out=None, **kwargs)¶

Pooling operator for input and output data type of int8. The input and output data comes with min and max thresholds for quantizing the float32 data into int8.

Note

This operator only supports pool_type of avg or max. Backward propagation computes the data gradient and returns zero min/max gradients.

Defined in /home/smola/mxnet/src/operator/quantization/quantized_pooling.cc:L443

Parameters:

data (Symbol) – Input data.
min_data (Symbol) – Minimum value of data.
max_data (Symbol) – Maximum value of data.
kernel (Shape(tuple), optional, default=[]) – Pooling kernel size: (y, x) or (d, y, x)
pool_type ({'avg', 'lp', 'max', 'sum'},optional, default='max') – Pooling type to be applied.
global_pool (boolean, optional, default=0) – Ignore kernel size, do global pooling based on current input feature map.
cudnn_off (boolean, optional, default=0) – Turn off cudnn pooling and use MXNet pooling operator.
pooling_convention ({'full', 'same', 'valid'},optional, default='valid') – Pooling convention to be applied.
stride (Shape(tuple), optional, default=[]) – Stride: for pooling (y, x) or (d, y, x). Defaults to 1 for each dimension.
pad (Shape(tuple), optional, default=[]) – Pad for pooling: (y, x) or (d, y, x). Defaults to no padding.
p_value (int or None, optional, default='None') – Value of p for Lp pooling, can be 1 or 2, required for Lp Pooling.
count_include_pad (boolean or None, optional, default=None) – Only used for AvgPool, specify whether to count padding elements for averagecalculation. For example, with a 5*5 kernel on a 3*3 corner of a image,the sum of the 9 valid elements will be divided by 25 if this is set to true,or it will be divided by 9 if this is set to false. Defaults to true.
layout ({None, 'NCDHW', 'NCHW', 'NCW', 'NDHWC', 'NHWC', 'NWC'},optional, default='None') –

Set layout for input and output. Empty for
default layout: NCW for 1d, NCHW for 2d and NCDHW for 3d.
output_size (Shape or None, optional, default=None) – Only used for Adaptive Pooling. int (output size) or a tuple of int for output (height, width).
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.quantized_reshape(data=None, min_data=None, max_data=None, newshape=_Null, reverse=_Null, order=_Null, name=None, attr=None, out=None, **kwargs)¶

Parameters:

data (Symbol) – Array to be reshaped.
min_data (Symbol) – The minimum scalar value possibly produced for the data
max_data (Symbol) – The maximum scalar value possibly produced for the data
newshape (Shape(tuple), required) – The new shape should be compatible with the original shape. If an integer, then the result will be a 1-D array of that length. One shape dimension can be -1. In this case, the value is inferred from the length of the array and remaining dimensions. -2 to -6 are used for data manipulation. -2 copy this dimension from the input to the output shape. -3 will skip current dimension if and only if the current dim size is one. -4 copy all remain of the input dimensions to the output shape. -5 use the product of two consecutive dimensions of the input shape as the output. -6 split one dimension of the input into two dimensions passed subsequent to -6 in the new shape.
reverse (boolean, optional, default=0) – If true then the special values are inferred from right to left
order (string, optional, default='C') – Read the elements of a using this index order, and place the elements into the reshaped array using this index order. ‘C’ means to read/write the elements using C-like index order, with the last axis index changing fastest, back to the first axis index changing slowest. Note that currently only C-like order is supported
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.quantized_transpose(data=None, min_data=None, max_data=None, axes=_Null, name=None, attr=None, out=None, **kwargs)¶

Parameters:

data (Symbol) – Array to be transposed.
min_data (Symbol) – The minimum scalar value possibly produced for the data
max_data (Symbol) – The maximum scalar value possibly produced for the data
axes (Shape(tuple), optional, default=None) – By default, reverse the dimensions, otherwise permute the axes according to the values given.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.relu(data=None, name=None, attr=None, out=None, **kwargs)¶

Computes rectified linear activation.

\[max(features, 0)\]

Defined in /home/smola/mxnet/src/operator/numpy/np_elemwise_unary_op_basic.cc:L38

Parameters:

data (Symbol) – The input array.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.requantize(data=None, min_range=None, max_range=None, out_type=_Null, min_calib_range=_Null, max_calib_range=_Null, name=None, attr=None, out=None, **kwargs)¶

Given data that is quantized in int32 and the corresponding thresholds, requantize the data into int8 using min and max thresholds either calculated at runtime or from calibration. It’s highly recommended to pre-calucate the min and max thresholds through calibration since it is able to save the runtime of the operator and improve the inference accuracy.

Note

This operator only supports forward propogation. DO NOT use it in training.

Defined in /home/smola/mxnet/src/operator/quantization/requantize.cc:L83

Parameters:

data (Symbol) – A ndarray/symbol of type int32
min_range (Symbol) – The original minimum scalar value in the form of float32 used for quantizing data into int32.
max_range (Symbol) – The original maximum scalar value in the form of float32 used for quantizing data into int32.
out_type ({'auto', 'int8', 'uint8'},optional, default='int8') – Output data type. auto can be specified to automatically determine output type according to min_calib_range.
min_calib_range (float or None, optional, default=None) – The minimum scalar value in the form of float32 obtained through calibration. If present, it will be used to requantize the int32 data into int8.
max_calib_range (float or None, optional, default=None) – The maximum scalar value in the form of float32 obtained through calibration. If present, it will be used to requantize the int32 data into int8.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.reshape(a=None, newshape=_Null, reverse=_Null, order=_Null, name=None, attr=None, out=None, **kwargs)¶

Gives a new shape to an array without changing its data. This function always returns a copy of the input array if out is not provided.

Parameters:

a (ndarray) – Array to be reshaped.
newshape (int or tuple of ints) –
The new shape should be compatible with the original shape. If an integer, then the result will be a 1-D array of that length. One shape dimension can be -1. In this case, the value is inferred from the length of the array and remaining dimensions. -2 to -6 are used for data manipulation.
- -2 copy this dimension from the input to the output shape.
- -3 will skip current dimension if and only if the current dim size is one.
- -4 copy all remain of the input dimensions to the output shape.
- -5 use the product of two consecutive dimensions of the input shape as the output.
- -6 split one dimension of the input into two dimensions passed subsequent to -6 in the new shape.
reverse (bool, optional) – If set to true, the special values will be inferred from right to left.
order ({'C'}, optional) – Read the elements of a using this index order, and place the elements into the reshaped array using this index order. ‘C’ means to read / write the elements using C-like index order, with the last axis index changing fastest, back to the first axis index changing slowest. Other order types such as ‘F’/’A’ may be added in the future.

Returns:

reshaped_array – It will be always a copy of the original array. This behavior is different from the official NumPy reshape operator where views of the original array may be generated.

Return type:

ndarray

Examples

>>> x = np.ones((2, 3, 8))
>>> npx.reshape(x, (-2, -2, 2, -1)).shape
(2, 3, 2, 4)
>>> x = np.ones((8, 3, 3, 3, 4, 4))
>>> npx.reshape(x, (-6, 2, -1, -4)).shape
(2, 4, 3, 3, 3, 4, 4)
>>> x = np.ones((8, 3, 3, 3, 4, 4))
>>> npx.reshape(x, (-5, -4)).shape
(24, 3, 3, 4, 4)
>>> x = np.ones((8, 1, 1, 1, 3))
>>> npx.reshape(x, (-2, -3, -3, -3, -2)).shape
(8, 3)
>>> x = np.ones((8, 3, 3, 3, 3, 8))
>>> npx.reshape(x, (-4, -5), reverse=True).shape
(8, 3, 3, 3, 24)
>>> x = np.ones((8, 3, 2, 4, 8))
>>> npx.reshape(x, (-4, -1, 2, -6), reverse=True).shape
(8, 3, 2, 4, 4, 2)

mxnet.symbol.numpy_extension.reshape_like(lhs=None, rhs=None, lhs_begin=_Null, lhs_end=_Null, rhs_begin=_Null, rhs_end=_Null, name=None, attr=None, out=None, **kwargs)¶

Reshape some or all dimensions of lhs to have the same shape as some or all dimensions of rhs.

Returns a view of the lhs array with a new shape without altering any data.

Example:

x = [1, 2, 3, 4, 5, 6]
y = [[0, -4], [3, 2], [2, 2]]
reshape_like(x, y) = [[1, 2], [3, 4], [5, 6]]

More precise control over how dimensions are inherited is achieved by specifying slices over the lhs and rhs array dimensions. Only the sliced lhs dimensions are reshaped to the rhs sliced dimensions, with the non-sliced lhs dimensions staying the same.

Examples:

- lhs shape = (30,7), rhs shape = (15,2,4), lhs_begin=0, lhs_end=1, rhs_begin=0, rhs_end=2, output shape = (15,2,7)
- lhs shape = (3, 5), rhs shape = (1,15,4), lhs_begin=0, lhs_end=2, rhs_begin=1, rhs_end=2, output shape = (15)

Negative indices are supported, and None can be used for either lhs_end or rhs_end to indicate the end of the range.

Example:

- lhs shape = (30, 12), rhs shape = (4, 2, 2, 3), lhs_begin=-1, lhs_end=None, rhs_begin=1, rhs_end=None, output shape = (30, 2, 2, 3)

Defined in /home/smola/mxnet/src/operator/tensor/elemwise_unary_op_basic.cc:L629

Parameters:

lhs (Symbol) – First input.
rhs (Symbol) – Second input.
lhs_begin (int or None, optional, default='None') – Defaults to 0. The beginning index along which the lhs dimensions are to be reshaped. Supports negative indices.
lhs_end (int or None, optional, default='None') – Defaults to None. The ending index along which the lhs dimensions are to be used for reshaping. Supports negative indices.
rhs_begin (int or None, optional, default='None') – Defaults to 0. The beginning index along which the rhs dimensions are to be used for reshaping. Supports negative indices.
rhs_end (int or None, optional, default='None') – Defaults to None. The ending index along which the rhs dimensions are to be used for reshaping. Supports negative indices.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.rnn(data=None, parameters=None, state=None, state_cell=None, sequence_length=None, state_size=_Null, num_layers=_Null, bidirectional=_Null, mode=_Null, p=_Null, state_outputs=_Null, projection_size=_Null, lstm_state_clip_min=_Null, lstm_state_clip_max=_Null, lstm_state_clip_nan=_Null, use_sequence_length=_Null, name=None, attr=None, out=None, **kwargs)¶

Applies recurrent layers to input data. Currently, vanilla RNN, LSTM and GRU are implemented, with both multi-layer and bidirectional support.

When the input data is of type float32 and the environment variables MXNET_CUDA_ALLOW_TENSOR_CORE and MXNET_CUDA_TENSOR_OP_MATH_ALLOW_CONVERSION are set to 1, this operator will try to use pseudo-float16 precision (float32 math with float16 I/O) precision in order to use Tensor Cores on suitable NVIDIA GPUs. This can sometimes give significant speedups.

Vanilla RNN

Applies a single-gate recurrent layer to input X. Two kinds of activation function are supported: ReLU and Tanh.

With ReLU activation function:

\[h_t = relu(W_{ih} * x_t + b_{ih} + W_{hh} * h_{(t-1)} + b_{hh})\]

With Tanh activtion function:

\[h_t = \tanh(W_{ih} * x_t + b_{ih} + W_{hh} * h_{(t-1)} + b_{hh})\]

Reference paper: Finding structure in time - Elman, 1988. https://axon.cs.byu.edu/~martinez/classes/678/Papers/Elman_time.pdf

LSTM

Long Short-Term Memory - Hochreiter, 1997. http://www.bioinf.jku.at/publications/older/2604.pdf

\[\begin{split}\begin{array}{ll} i_t = \mathrm{sigmoid}(W_{ii} x_t + b_{ii} + W_{hi} h_{(t-1)} + b_{hi}) \\ f_t = \mathrm{sigmoid}(W_{if} x_t + b_{if} + W_{hf} h_{(t-1)} + b_{hf}) \\ g_t = \tanh(W_{ig} x_t + b_{ig} + W_{hc} h_{(t-1)} + b_{hg}) \\ o_t = \mathrm{sigmoid}(W_{io} x_t + b_{io} + W_{ho} h_{(t-1)} + b_{ho}) \\ c_t = f_t * c_{(t-1)} + i_t * g_t \\ h_t = o_t * \tanh(c_t) \end{array}\end{split}\]

With the projection size being set, LSTM could use the projection feature to reduce the parameters size and give some speedups without significant damage to the accuracy.

Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition - Sak et al. 2014. https://arxiv.org/abs/1402.1128

\[\begin{split}\begin{array}{ll} i_t = \mathrm{sigmoid}(W_{ii} x_t + b_{ii} + W_{ri} r_{(t-1)} + b_{ri}) \\ f_t = \mathrm{sigmoid}(W_{if} x_t + b_{if} + W_{rf} r_{(t-1)} + b_{rf}) \\ g_t = \tanh(W_{ig} x_t + b_{ig} + W_{rc} r_{(t-1)} + b_{rg}) \\ o_t = \mathrm{sigmoid}(W_{io} x_t + b_{o} + W_{ro} r_{(t-1)} + b_{ro}) \\ c_t = f_t * c_{(t-1)} + i_t * g_t \\ h_t = o_t * \tanh(c_t) r_t = W_{hr} h_t \end{array}\end{split}\]

GRU

Gated Recurrent Unit - Cho et al. 2014. http://arxiv.org/abs/1406.1078

The definition of GRU here is slightly different from paper but compatible with CUDNN.

\[\begin{split}\begin{array}{ll} r_t = \mathrm{sigmoid}(W_{ir} x_t + b_{ir} + W_{hr} h_{(t-1)} + b_{hr}) \\ z_t = \mathrm{sigmoid}(W_{iz} x_t + b_{iz} + W_{hz} h_{(t-1)} + b_{hz}) \\ n_t = \tanh(W_{in} x_t + b_{in} + r_t * (W_{hn} h_{(t-1)}+ b_{hn})) \\ h_t = (1 - z_t) * n_t + z_t * h_{(t-1)} \\ \end{array}\end{split}\]

Defined in /home/smola/mxnet/src/operator/rnn.cc:L379

Parameters:

data (Symbol) – Input data to RNN
parameters (Symbol) – Vector of all RNN trainable parameters concatenated
state (Symbol) – initial hidden state of the RNN
state_cell (Symbol) – initial cell state for LSTM networks (only for LSTM)
sequence_length (Symbol) – Vector of valid sequence lengths for each element in batch. (Only used if use_sequence_length kwarg is True)
state_size (int (non-negative), required) – size of the state for each layer
num_layers (int (non-negative), required) – number of stacked layers
bidirectional (boolean, optional, default=0) – whether to use bidirectional recurrent layers
mode ({'gru', 'lstm', 'rnn_relu', 'rnn_tanh'}, required) – the type of RNN to compute
p (float, optional, default=0) – drop rate of the dropout on the outputs of each RNN layer, except the last layer.
state_outputs (boolean, optional, default=0) – Whether to have the states as symbol outputs.
projection_size (int or None, optional, default='None') – size of project size
lstm_state_clip_min (double or None, optional, default=None) – Minimum clip value of LSTM states. This option must be used together with lstm_state_clip_max.
lstm_state_clip_max (double or None, optional, default=None) – Maximum clip value of LSTM states. This option must be used together with lstm_state_clip_min.
lstm_state_clip_nan (boolean, optional, default=0) – Whether to stop NaN from propagating in state by clipping it to min/max. If clipping range is not specified, this option is ignored.
use_sequence_length (boolean, optional, default=0) – If set to true, this layer takes in an extra input parameter sequence_length to specify variable length sequence
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.roi_pooling(data=None, rois=None, pooled_size=_Null, spatial_scale=_Null, name=None, attr=None, out=None, **kwargs)¶

Performs region of interest(ROI) pooling on the input array.

ROI pooling is a variant of a max pooling layer, in which the output size is fixed and region of interest is a parameter. Its purpose is to perform max pooling on the inputs of non-uniform sizes to obtain fixed-size feature maps. ROI pooling is a neural-net layer mostly used in training a Fast R-CNN network for object detection.

This operator takes a 4D feature map as an input array and region proposals as rois, then it pools over sub-regions of input and produces a fixed-sized output array regardless of the ROI size.

To crop the feature map accordingly, you can resize the bounding box coordinates by changing the parameters rois and spatial_scale.

The cropped feature maps are pooled by standard max pooling operation to a fixed size output indicated by a pooled_size parameter. batch_size will change to the number of region bounding boxes after ROIPooling.

The size of each region of interest doesn’t have to be perfectly divisible by the number of pooling sections(pooled_size).

Example:

x = [[[[  0.,   1.,   2.,   3.,   4.,   5.],
       [  6.,   7.,   8.,   9.,  10.,  11.],
       [ 12.,  13.,  14.,  15.,  16.,  17.],
       [ 18.,  19.,  20.,  21.,  22.,  23.],
       [ 24.,  25.,  26.,  27.,  28.,  29.],
       [ 30.,  31.,  32.,  33.,  34.,  35.],
       [ 36.,  37.,  38.,  39.,  40.,  41.],
       [ 42.,  43.,  44.,  45.,  46.,  47.]]]]

// region of interest i.e. bounding box coordinates.
y = [[0,0,0,4,4]]

// returns array of shape (2,2) according to the given roi with max pooling.
ROIPooling(x, y, (2,2), 1.0) = [[[[ 14.,  16.],
                                  [ 26.,  28.]]]]

// region of interest is changed due to the change in `spacial_scale` parameter.
ROIPooling(x, y, (2,2), 0.7) = [[[[  7.,   9.],
                                  [ 19.,  21.]]]]

Defined in /home/smola/mxnet/src/operator/roi_pooling.cc:L217

Parameters:

data (Symbol) – The input array to the pooling operator, a 4D Feature maps
rois (Symbol) – Bounding box coordinates, a 2D array of [[batch_index, x1, y1, x2, y2]], where (x1, y1) and (x2, y2) are top left and bottom right corners of designated region of interest. batch_index indicates the index of corresponding image in the input array
pooled_size (Shape(tuple), required) – ROI pooling output shape (h,w)
spatial_scale (float, required) – Ratio of input feature map height (or w) to raw image height (or w). Equals the reciprocal of total stride in convolutional layers
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.round_ste(data=None, name=None, attr=None, out=None, **kwargs)¶

Straight-through-estimator of round().

In forward pass, returns element-wise rounded value to the nearest integer of the input (same as round()).

In backward pass, returns gradients of 1 everywhere (instead of 0 everywhere as in round()): \(\frac{d}{dx}{round\_ste(x)} = 1\) vs. \(\frac{d}{dx}{round(x)} = 0\). This is useful for quantized training.

Reference: Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation.

Example::

x = round_ste([-1.5, 1.5, -1.9, 1.9, 2.7]) x.backward() x = [-2., 2., -2., 2., 3.] x.grad() = [1., 1., 1., 1., 1.]

The storage type of round_ste output depends upon the input storage type:

round_ste(default) = default
round_ste(row_sparse) = row_sparse
round_ste(csr) = csr

Defined in /home/smola/mxnet/src/operator/contrib/stes_op.cc:L54

Parameters:

data (Symbol) – The input array.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.scalar_poisson(lam=_Null, shape=_Null, ctx=_Null, dtype=_Null, name=None, attr=None, out=None, **kwargs)¶

Draw random samples from a Poisson distribution.

Samples are distributed according to a Poisson distribution parametrized by lambda (rate). Samples will always be returned as a floating point data type.

Example:

poisson(lam=4, shape=(2,2)) = [[ 5.,  2.],
                               [ 4.,  6.]]

Defined in /home/smola/mxnet/src/operator/random/sample_op.cc:L152

Parameters:

lam (float, optional, default=1) – Lambda parameter (rate) of the Poisson distribution.
shape (Shape(tuple), optional, default=None) – Shape of the output.
ctx (string, optional, default='') – Context of output, in format [cpu|gpu|cpu_pinned](n). Only used for imperative calls.
dtype ({'None', 'bfloat16', 'float16', 'float32', 'float64'},optional, default='None') – DType of the output in case this can’t be inferred. Defaults to float32 if not defined (dtype=None).
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.sequence_last(data=None, sequence_length=None, use_sequence_length=_Null, axis=_Null, name=None, attr=None, out=None, **kwargs)¶

Takes the last element of a sequence.

This function takes an n-dimensional input array of the form [max_sequence_length, batch_size, other_feature_dims] and returns a (n-1)-dimensional array of the form [batch_size, other_feature_dims].

Parameter sequence_length is used to handle variable-length sequences. sequence_length should be an input array of positive ints of dimension [batch_size]. To use this parameter, set use_sequence_length to True, otherwise each example in the batch is assumed to have the max sequence length.

Note

Alternatively, you can also use take operator.

Example:

x = [[[  1.,   2.,   3.],
      [  4.,   5.,   6.],
      [  7.,   8.,   9.]],

     [[ 10.,   11.,   12.],
      [ 13.,   14.,   15.],
      [ 16.,   17.,   18.]],

     [[  19.,   20.,   21.],
      [  22.,   23.,   24.],
      [  25.,   26.,   27.]]]

// returns last sequence when sequence_length parameter is not used
SequenceLast(x) = [[  19.,   20.,   21.],
                   [  22.,   23.,   24.],
                   [  25.,   26.,   27.]]

// sequence_length is used
SequenceLast(x, sequence_length=[1,1,1], use_sequence_length=True) =
         [[  1.,   2.,   3.],
          [  4.,   5.,   6.],
          [  7.,   8.,   9.]]

// sequence_length is used
SequenceLast(x, sequence_length=[1,2,3], use_sequence_length=True) =
         [[  1.,    2.,   3.],
          [  13.,  14.,  15.],
          [  25.,  26.,  27.]]

Defined in /home/smola/mxnet/src/operator/sequence_last.cc:L103

Parameters:

data (Symbol) – n-dimensional input array of the form [max_sequence_length, batch_size, other_feature_dims] where n>2
sequence_length (Symbol) – vector of sequence lengths of the form [batch_size]
use_sequence_length (boolean, optional, default=0) – If set to true, this layer takes in an extra input parameter sequence_length to specify variable length sequence
axis (int, optional, default='0') – The sequence axis. Only values of 0 and 1 are currently supported.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.sequence_mask(data=None, sequence_length=None, use_sequence_length=_Null, value=_Null, axis=_Null, name=None, attr=None, out=None, **kwargs)¶

Sets all elements outside the sequence to a constant value.

This function takes an n-dimensional input array of the form [max_sequence_length, batch_size, other_feature_dims] and returns an array of the same shape.

Example:

x = [[[  1.,   2.,   3.],
      [  4.,   5.,   6.]],

     [[  7.,   8.,   9.],
      [ 10.,  11.,  12.]],

     [[ 13.,  14.,   15.],
      [ 16.,  17.,   18.]]]

// Batch 1
B1 = [[  1.,   2.,   3.],
      [  7.,   8.,   9.],
      [ 13.,  14.,  15.]]

// Batch 2
B2 = [[  4.,   5.,   6.],
      [ 10.,  11.,  12.],
      [ 16.,  17.,  18.]]

// works as identity operator when sequence_length parameter is not used
SequenceMask(x) = [[[  1.,   2.,   3.],
                    [  4.,   5.,   6.]],

                   [[  7.,   8.,   9.],
                    [ 10.,  11.,  12.]],

                   [[ 13.,  14.,   15.],
                    [ 16.,  17.,   18.]]]

// sequence_length [1,1] means 1 of each batch will be kept
// and other rows are masked with default mask value = 0
SequenceMask(x, sequence_length=[1,1], use_sequence_length=True) =
             [[[  1.,   2.,   3.],
               [  4.,   5.,   6.]],

              [[  0.,   0.,   0.],
               [  0.,   0.,   0.]],

              [[  0.,   0.,   0.],
               [  0.,   0.,   0.]]]

// sequence_length [2,3] means 2 of batch B1 and 3 of batch B2 will be kept
// and other rows are masked with value = 1
SequenceMask(x, sequence_length=[2,3], use_sequence_length=True, value=1) =
             [[[  1.,   2.,   3.],
               [  4.,   5.,   6.]],

              [[  7.,   8.,   9.],
               [  10.,  11.,  12.]],

              [[   1.,   1.,   1.],
               [  16.,  17.,  18.]]]

Defined in /home/smola/mxnet/src/operator/sequence_mask.cc:L186

Parameters:

data (Symbol) – n-dimensional input array of the form [max_sequence_length, batch_size, other_feature_dims] where n>2
sequence_length (Symbol) – vector of sequence lengths of the form [batch_size]
use_sequence_length (boolean, optional, default=0) – If set to true, this layer takes in an extra input parameter sequence_length to specify variable length sequence
value (float, optional, default=0) – The value to be used as a mask.
axis (int, optional, default='0') – The sequence axis. Only values of 0 and 1 are currently supported.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.sequence_reverse(data=None, sequence_length=None, use_sequence_length=_Null, axis=_Null, name=None, attr=None, out=None, **kwargs)¶

Reverses the elements of each sequence.

This function takes an n-dimensional input array of the form [max_sequence_length, batch_size, other_feature_dims] and returns an array of the same shape.

Example:

x = [[[  1.,   2.,   3.],
      [  4.,   5.,   6.]],

     [[  7.,   8.,   9.],
      [ 10.,  11.,  12.]],

     [[ 13.,  14.,   15.],
      [ 16.,  17.,   18.]]]

// Batch 1
B1 = [[  1.,   2.,   3.],
      [  7.,   8.,   9.],
      [ 13.,  14.,  15.]]

// Batch 2
B2 = [[  4.,   5.,   6.],
      [ 10.,  11.,  12.],
      [ 16.,  17.,  18.]]

// returns reverse sequence when sequence_length parameter is not used
SequenceReverse(x) = [[[ 13.,  14.,   15.],
                       [ 16.,  17.,   18.]],

                      [[  7.,   8.,   9.],
                       [ 10.,  11.,  12.]],

                      [[  1.,   2.,   3.],
                       [  4.,   5.,   6.]]]

// sequence_length [2,2] means 2 rows of
// both batch B1 and B2 will be reversed.
SequenceReverse(x, sequence_length=[2,2], use_sequence_length=True) =
                  [[[  7.,   8.,   9.],
                    [ 10.,  11.,  12.]],

                   [[  1.,   2.,   3.],
                    [  4.,   5.,   6.]],

                   [[ 13.,  14.,   15.],
                    [ 16.,  17.,   18.]]]

// sequence_length [2,3] means 2 of batch B2 and 3 of batch B3
// will be reversed.
SequenceReverse(x, sequence_length=[2,3], use_sequence_length=True) =
                 [[[  7.,   8.,   9.],
                   [ 16.,  17.,  18.]],

                  [[  1.,   2.,   3.],
                   [ 10.,  11.,  12.]],

                  [[ 13.,  14,   15.],
                   [  4.,   5.,   6.]]]

Defined in /home/smola/mxnet/src/operator/sequence_reverse.cc:L118

Parameters:

data (Symbol) – n-dimensional input array of the form [max_sequence_length, batch_size, other dims] where n>2
sequence_length (Symbol) – vector of sequence lengths of the form [batch_size]
use_sequence_length (boolean, optional, default=0) – If set to true, this layer takes in an extra input parameter sequence_length to specify variable length sequence
axis (int, optional, default='0') – The sequence axis. Only 0 is currently supported.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.shape_array(data=None, name=None, attr=None, out=None, **kwargs)¶

Returns a 1D int64 array containing the shape of data.

Example:

shape_array([[1,2,3,4], [5,6,7,8]]) = [2,4]

Defined in /home/smola/mxnet/src/operator/tensor/elemwise_unary_op_basic.cc:L698

Parameters:

data (Symbol) – Input Array.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.sigmoid(data=None, name=None, attr=None, out=None, **kwargs)¶

Computes sigmoid of x element-wise.

\[y = 1 / (1 + exp(-x))\]

Defined in /home/smola/mxnet/src/operator/numpy/np_elemwise_unary_op_basic.cc:L49

Parameters:

data (Symbol) – The input array.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.sign_ste(data=None, name=None, attr=None, out=None, **kwargs)¶

Straight-through-estimator of sign().

In forward pass, returns element-wise sign of the input (same as sign()).

In backward pass, returns gradients of 1 everywhere (instead of 0 everywhere as in sign()): \(\frac{d}{dx}{sign\_ste(x)} = 1\) vs. \(\frac{d}{dx}{sign(x)} = 0\). This is useful for quantized training.

Reference: Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation.

Example::

x = sign_ste([-2, 0, 3]) x.backward() x = [-1., 0., 1.] x.grad() = [1., 1., 1.]

The storage type of sign_ste output depends upon the input storage type:

round_ste(default) = default
round_ste(row_sparse) = row_sparse
round_ste(csr) = csr

Defined in /home/smola/mxnet/src/operator/contrib/stes_op.cc:L80

Parameters:

data (Symbol) – The input array.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.sldwin_atten_context(score=None, value=None, dilation=None, w=_Null, symmetric=_Null, name=None, attr=None, out=None, **kwargs)¶

Compute the context vector for sliding window attention, used in Longformer (https://arxiv.org/pdf/2004.05150.pdf).

In this attention pattern, given a fixed window size 2w, each token attends to w tokens on the left side if we use causal attention (setting symmetric to False), otherwise each token attends to w tokens on each side.

The shapes of the inputs are: - score :

value : (batch_size, seq_length, num_heads, num_head_units)
dilation : (num_heads,)

The shape of the output is: - context_vec : (batch_size, seq_length, num_heads, num_head_units)

Defined in /home/smola/mxnet/src/operator/contrib/transformer.cc:L1045

Parameters:

score (Symbol) – score
value (Symbol) – value
dilation (Symbol) – dilation
w (int, required) – The one-sided window length
symmetric (boolean, required) – If false, each token will only attend to itself and the previous tokens.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.sldwin_atten_mask_like(score=None, dilation=None, valid_length=None, w=_Null, symmetric=_Null, name=None, attr=None, out=None, **kwargs)¶

Compute the mask for the sliding window attention score, used in Longformer (https://arxiv.org/pdf/2004.05150.pdf).

The shapes of the inputs are: - score :

dilation : (num_heads,)
valid_length : (batch_size,)

The shape of the output is: - mask : same as the shape of score

Defined in /home/smola/mxnet/src/operator/contrib/transformer.cc:L909

Parameters:

score (Symbol) – sliding window attention score
dilation (Symbol) – dilation
valid_length (Symbol) – valid length
w (int, required) – The one-sided window length
symmetric (boolean, required) – If false, each token will only attend to itself and the previous tokens.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.sldwin_atten_score(query=None, key=None, dilation=None, w=_Null, symmetric=_Null, name=None, attr=None, out=None, **kwargs)¶

Compute the sliding window attention score, which is used in Longformer (https://arxiv.org/pdf/2004.05150.pdf). In this attention pattern, given a fixed window size 2w, each token attends to w tokens on the left side if we use causal attention (setting symmetric to False), otherwise each token attends to w tokens on each side.

The shapes of the inputs are: - query : (batch_size, seq_length, num_heads, num_head_units) - key : (batch_size, seq_length, num_heads, num_head_units) - dilation : (num_heads,)

The shape of the output is: - score :

Defined in /home/smola/mxnet/src/operator/contrib/transformer.cc:L969

Parameters:

query (Symbol) – query
key (Symbol) – key
dilation (Symbol) – dilation
w (int, required) – The one-sided window length
symmetric (boolean, required) – If false, each token will only attend to itself and the previous tokens.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.slice(data=None, begin=_Null, end=_Null, step=_Null, name=None, attr=None, out=None, **kwargs)¶

Slices a region of the array.

Note

crop is deprecated. Use slice instead.

This function returns a sliced array between the indices given by begin and end with the corresponding step. For an input array of shape=(d_0, d_1, ..., d_n-1), slice operation with begin=(b_0, b_1...b_m-1), end=(e_0, e_1, ..., e_m-1), and step=(s_0, s_1, ..., s_m-1), where m <= n, results in an array with the shape (|e_0-b_0|/|s_0|, ..., |e_m-1-b_m-1|/|s_m-1|, d_m, ..., d_n-1). The resulting array’s k-th dimension contains elements from the k-th dimension of the input array starting from index b_k (inclusive) with step s_k until reaching e_k (exclusive). If the k-th elements are None in the sequence of begin, end, and step, the following rule will be used to set default values. If s_k is None, set s_k=1. If s_k > 0, set b_k=0, e_k=d_k; else, set b_k=d_k-1, e_k=-1. The storage type of slice output depends on storage types of inputs * slice(csr) = csr * otherwise, slice generates output with default storage

Note

When input data storage type is csr, it only supports step=(), or step=(None,), or step=(1,) to generate a csr output. For other step parameter values, it falls back to slicing a dense tensor.

Example:

x = [[  1.,   2.,   3.,   4.],
     [  5.,   6.,   7.,   8.],
     [  9.,  10.,  11.,  12.]]
slice(x, begin=(0,1), end=(2,4)) = [[ 2.,  3.,  4.],
                                   [ 6.,  7.,  8.]]
slice(x, begin=(None, 0), end=(None, 3), step=(-1, 2)) = [[9., 11.],
                                                          [5.,  7.],
                                                          [1.,  3.]]

Defined in /home/smola/mxnet/src/operator/tensor/matrix_op.cc:L535

Parameters:

data (Symbol) – Source input
begin (tuple of <>, required) – starting indices for the slice operation, supports negative indices.
end (tuple of <>, required) – ending indices for the slice operation, supports negative indices.
step (tuple of <>, optional, default=[]) – step for the slice operation, supports negative values.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.slice_channel(data=None, num_outputs=_Null, axis=_Null, squeeze_axis=_Null, name=None, attr=None, out=None, **kwargs)¶

Splits an array along a particular axis into multiple sub-arrays.

Note

SliceChannel is deprecated. Use split instead.

Note that num_outputs should evenly divide the length of the axis along which to split the array.

Example:

x  = [[[ 1.]
       [ 2.]]
      [[ 3.]
       [ 4.]]
      [[ 5.]
       [ 6.]]]
x.shape = (3, 2, 1)

y = split(x, axis=1, num_outputs=2) // a list of 2 arrays with shape (3, 1, 1)
y = [[[ 1.]]
     [[ 3.]]
     [[ 5.]]]

    [[[ 2.]]
     [[ 4.]]
     [[ 6.]]]

y[0].shape = (3, 1, 1)

z = split(x, axis=0, num_outputs=3) // a list of 3 arrays with shape (1, 2, 1)
z = [[[ 1.]
      [ 2.]]]

    [[[ 3.]
      [ 4.]]]

    [[[ 5.]
      [ 6.]]]

z[0].shape = (1, 2, 1)

squeeze_axis=1 removes the axis with length 1 from the shapes of the output arrays. Note that setting squeeze_axis to 1 removes axis with length 1 only along the axis which it is split. Also squeeze_axis can be set to true only if input.shape[axis] == num_outputs.

Example:

z = split(x, axis=0, num_outputs=3, squeeze_axis=1) // a list of 3 arrays with shape (2, 1)
z = [[ 1.]
     [ 2.]]

    [[ 3.]
     [ 4.]]

    [[ 5.]
     [ 6.]]
z[0].shape = (2 ,1 )

Defined in /home/smola/mxnet/src/operator/slice_channel.cc:L104

Parameters:

data (Symbol) – The input
num_outputs (int, required) – Number of splits. Note that this should evenly divide the length of the axis.
axis (int, optional, default='1') – Axis along which to split.
squeeze_axis (boolean, optional, default=0) – If true, Removes the axis with length 1 from the shapes of the output arrays. Note that setting squeeze_axis to true removes axis with length 1 only along the axis which it is split. Also squeeze_axis can be set to true only if input.shape[axis] == num_outputs.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.slice_like(data=None, shape_like=None, axes=_Null, name=None, attr=None, out=None, **kwargs)¶

Slices a region of the array like the shape of another array. This function is similar to slice, however, the begin are always 0`s and `end of specific axes are inferred from the second input shape_like. Given the second shape_like input of shape=(d_0, d_1, ..., d_n-1), a slice_like operator with default empty axes, it performs the following operation: `` out = slice(input, begin=(0, 0, …, 0), end=(d_0, d_1, …, d_n-1))``. When axes is not empty, it is used to speficy which axes are being sliced. Given a 4-d input data, slice_like operator with axes=(0, 2, -1) will perform the following operation: `` out = slice(input, begin=(0, 0, 0, 0), end=(d_0, None, d_2, d_3))``. Note that it is allowed to have first and second input with different dimensions, however, you have to make sure the axes are specified and not exceeding the dimension limits. For example, given input_1 with shape=(2,3,4,5) and input_2 with shape=(1,2,3), it is not allowed to use: `` out = slice_like(a, b)`` because ndim of input_1 is 4, and ndim of input_2 is 3. The following is allowed in this situation: `` out = slice_like(a, b, axes=(0, 2))`` Example:

x = [[  1.,   2.,   3.,   4.],
     [  5.,   6.,   7.,   8.],
     [  9.,  10.,  11.,  12.]]
y = [[  0.,   0.,   0.],
     [  0.,   0.,   0.]]
slice_like(x, y) = [[ 1.,  2.,  3.]
                    [ 5.,  6.,  7.]]
slice_like(x, y, axes=(0, 1)) = [[ 1.,  2.,  3.]
                                 [ 5.,  6.,  7.]]
slice_like(x, y, axes=(0)) = [[ 1.,  2.,  3.,  4.]
                              [ 5.,  6.,  7.,  8.]]
slice_like(x, y, axes=(-1)) = [[  1.,   2.,   3.]
                               [  5.,   6.,   7.]
                               [  9.,  10.,  11.]]

Defined in /home/smola/mxnet/src/operator/tensor/matrix_op.cc:L681

Parameters:

data (Symbol) – Source input
shape_like (Symbol) – Shape like input
axes (Shape(tuple), optional, default=[]) – List of axes on which input data will be sliced according to the corresponding size of the second input. By default will slice on all axes. Negative axes are supported.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.smooth_l1(data=None, scalar=_Null, name=None, attr=None, out=None, **kwargs)¶

Calculate Smooth L1 Loss(lhs, scalar) by summing

\[\begin{split}f(x) = \begin{cases} (\sigma x)^2/2,& \text{if }x < 1/\sigma^2\\ |x|-0.5/\sigma^2,& \text{otherwise} \end{cases}\end{split}\]

where \(x\) is an element of the tensor lhs and \(\sigma\) is the scalar.

Example:

smooth_l1([1, 2, 3, 4]) = [0.5, 1.5, 2.5, 3.5]
smooth_l1([1, 2, 3, 4], scalar=1) = [0.5, 1.5, 2.5, 3.5]

Defined in /home/smola/mxnet/src/operator/tensor/elemwise_binary_scalar_op_extended.cc:L138

Parameters:

data (Symbol) – source input
scalar (float) – scalar input
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.softmax(data=None, length=None, axis=_Null, temperature=_Null, dtype=_Null, use_length=_Null, name=None, attr=None, out=None, **kwargs)¶

Applies the softmax function.

The resulting array contains elements in the range (0,1) and the elements along the given axis sum up to 1.

\[softmax(\mathbf{z/t})_j = \frac{e^{z_j/t}}{\sum_{k=1}^K e^{z_k/t}}\]

for \(j = 1, ..., K\)

t is the temperature parameter in softmax function. By default, t equals 1.0

Example:

x = [[ 1.  1.  1.]
     [ 1.  1.  1.]]

softmax(x,axis=0) = [[ 0.5  0.5  0.5]
                     [ 0.5  0.5  0.5]]

softmax(x,axis=1) = [[ 0.33333334,  0.33333334,  0.33333334],
                     [ 0.33333334,  0.33333334,  0.33333334]]

Defined in /home/smola/mxnet/src/operator/nn/softmax.cc:L132

Parameters:

data (Symbol) – The input array.
length (Symbol) – The length array.
axis (int, optional, default='-1') – The axis along which to compute softmax.
temperature (double or None, optional, default=None) – Temperature parameter in softmax
dtype ({None, 'float16', 'float32', 'float64'},optional, default='None') – DType of the output in case this can’t be inferred. Defaults to the same as input’s dtype if not defined (dtype=None).
use_length (boolean or None, optional, default=0) – Whether to use the length input as a mask over the data input.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.softsign(data=None, name=None, attr=None, out=None, **kwargs)¶

Computes softsign of x element-wise.

\[y = x / (1 + abs(x))\]

The storage type of softsign output is always dense

Defined in /home/smola/mxnet/src/operator/tensor/elemwise_unary_op_basic.cc:L294

Parameters:

data (Symbol) – The input array.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.stop_gradient(data=None, name=None, attr=None, out=None, **kwargs)¶

Stops gradient computation.

Stops the accumulated gradient of the inputs from flowing through this operator in the backward direction. In other words, this operator prevents the contribution of its inputs to be taken into account for computing gradients.

Example:

v1 = [1, 2]
v2 = [0, 1]
a = Variable('a')
b = Variable('b')
b_stop_grad = stop_gradient(3 * b)
loss = MakeLoss(b_stop_grad + a)

executor = loss.simple_bind(ctx=cpu(), a=(1,2), b=(1,2))
executor.forward(is_train=True, a=v1, b=v2)
executor.outputs
[ 1.  5.]

executor.backward()
executor.grad_arrays
[ 0.  0.]
[ 1.  1.]

Defined in /home/smola/mxnet/src/operator/tensor/elemwise_unary_op_basic.cc:L430

Parameters:

data (Symbol) – The input array.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.sync_batch_norm(data=None, gamma=None, beta=None, moving_mean=None, moving_var=None, eps=_Null, momentum=_Null, fix_gamma=_Null, use_global_stats=_Null, output_mean_var=_Null, ndev=_Null, key=_Null, name=None, attr=None, out=None, **kwargs)¶

Batch normalization.

Normalizes a data batch by mean and variance, and applies a scale gamma as well as offset beta. Standard BN [1]_ implementation only normalize the data within each device. SyncBN normalizes the input within the whole mini-batch. We follow the sync-onece implmentation described in the paper [2].

Assume the input has more than one dimension and we normalize along axis 1. We first compute the mean and variance along this axis:

\[\begin{split}data\_mean[i] = mean(data[:,i,:,...]) \\ data\_var[i] = var(data[:,i,:,...])\end{split}\]

Then compute the normalized output, which has the same shape as input, as following:

\[out[:,i,:,...] = \frac{data[:,i,:,...] - data\_mean[i]}{\sqrt{data\_var[i]+\epsilon}} * gamma[i] + beta[i]\]

Both mean and var returns a scalar by treating the input as a vector.

Assume the input has size k on axis 1, then both gamma and beta have shape (k,). If output_mean_var is set to be true, then outputs both data_mean and data_var as well, which are needed for the backward pass.

moving_mean = moving_mean * momentum + data_mean * (1 - momentum)
moving_var = moving_var * momentum + data_var * (1 - momentum)

If use_global_stats is set to be true, then moving_mean and moving_var are used instead of data_mean and data_var to compute the output. It is often used during inference.

Both gamma and beta are learnable parameters. But if fix_gamma is true, then set gamma to 1 and its gradient to 0.

Reference:: [1]
Ioffe, Sergey, and Christian Szegedy. “Batch normalization: Accelerating deep network training by reducing internal covariate shift.” ICML 2015

[2]
Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, and Amit Agrawal. “Context Encoding for Semantic Segmentation.” CVPR 2018

Defined in /home/smola/mxnet/src/operator/contrib/sync_batch_norm.cc:L97

Parameters:

data (Symbol) – Input data to batch normalization
gamma (Symbol) – gamma array
beta (Symbol) – beta array
moving_mean (Symbol) – running mean of input
moving_var (Symbol) – running variance of input
eps (float, optional, default=0.00100000005) – Epsilon to prevent div 0
momentum (float, optional, default=0.899999976) – Momentum for moving average
fix_gamma (boolean, optional, default=1) – Fix gamma while training
use_global_stats (boolean, optional, default=0) – Whether use global moving statistics instead of local batch-norm. This will force change batch-norm into a scale shift operator.
output_mean_var (boolean, optional, default=0) – Output All,normal mean and var
ndev (int, optional, default='1') – The count of GPU devices
key (string, required) – Hash key for synchronization, please set the same hash key for same layer, Block.prefix is typically used as in gluon.nn.contrib.SyncBatchNorm.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.tensor_poisson(lam=None, shape=_Null, dtype=_Null, name=None, attr=None, out=None, **kwargs)¶

Concurrent sampling from multiple Poisson distributions with parameters lambda (rate).

The parameters of the distributions are provided as an input array. Let [s] be the shape of the input array, n be the dimension of [s], [t] be the shape specified as the parameter of the operator, and m be the dimension of [t]. Then the output will be a (n+m)-dimensional array with shape [s]x[t].

For any valid n-dimensional index i with respect to the input array, output[i] will be an m-dimensional array that holds randomly drawn samples from the distribution which is parameterized by the input value at index i. If the shape parameter of the operator is not set, then one sample will be drawn per distribution and the output array has the same shape as the input array.

Samples will always be returned as a floating point data type.

Examples:

lam = [ 1.0, 8.5 ]

// Draw a single sample for each distribution
sample_poisson(lam) = [  0.,  13.]

// Draw a vector containing two samples for each distribution
sample_poisson(lam, shape=(2)) = [[  0.,   4.],
                                  [ 13.,   8.]]

Defined in /home/smola/mxnet/src/operator/random/multisample_op.cc:L340

Parameters:

lam (Symbol) – Lambda (rate) parameters of the distributions.
shape (Shape(tuple), optional, default=[]) – Shape to be sampled from each random distribution.
dtype ({'None', 'float16', 'float32', 'float64'},optional, default='None') – DType of the output in case this can’t be inferred. Defaults to float32 if not defined (dtype=None).
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.topk(data=None, axis=_Null, k=_Null, ret_typ=_Null, is_ascend=_Null, dtype=_Null, name=None, attr=None, out=None, **kwargs)¶

Returns the indices of the top k elements in an input array along the given: axis (by default). If ret_type is set to ‘value’ returns the value of top k elements (instead of indices). In case of ret_type = ‘both’, both value and index would be returned. The returned elements will be sorted.

Examples:

x = [[ 0.3,  0.2,  0.4],
     [ 0.1,  0.3,  0.2]]

// returns an index of the largest element on last axis
topk(x) = [[ 2.],
           [ 1.]]

// returns the value of top-2 largest elements on last axis
topk(x, ret_typ='value', k=2) = [[ 0.4,  0.3],
                                 [ 0.3,  0.2]]

// returns the value of top-2 smallest elements on last axis
topk(x, ret_typ='value', k=2, is_ascend=1) = [[ 0.2 ,  0.3],
                                             [ 0.1 ,  0.2]]

// returns the value of top-2 largest elements on axis 0
topk(x, axis=0, ret_typ='value', k=2) = [[ 0.3,  0.3,  0.4],
                                         [ 0.1,  0.2,  0.2]]

// flattens and then returns list of both values and indices
topk(x, ret_typ='both', k=2) = [[[ 0.4,  0.3], [ 0.3,  0.2]] ,  [[ 2.,  0.], [ 1.,  2.]]]

Defined in /home/smola/mxnet/src/operator/tensor/ordering_op.cc:L66

Parameters:

data (Symbol) – The input array
axis (int or None, optional, default='-1') – Axis along which to choose the top k indices. If not given, the flattened array is used. Default is -1.
k (int, optional, default='1') – Number of top elements to select, should be always smaller than or equal to the element number in the given axis. A global sort is performed if set k < 1.
ret_typ ({'both', 'indices', 'mask', 'value'},optional, default='indices') –

The return type.
”value” means to return the top k values, “indices” means to return the indices of the top k values, “mask” means to return a mask array containing 0 and 1. 1 means the top k values. “both” means to return a list of both values and indices of top k elements.
is_ascend (boolean, optional, default=0) – Whether to choose k largest or k smallest elements. Top K largest elements will be chosen if set to false.
dtype ({'float16', 'float32', 'float64', 'int32', 'int64', 'uint8'},optional, default='float32') – DType of the output indices when ret_typ is “indices” or “both”. An error will be raised if the selected data type cannot precisely represent the indices.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.numpy_extension.while_loop(*data, **kwargs)¶

Run a while loop over with user-defined condition and computation

From:/home/smola/mxnet/src/operator/npx_control_flow.cc:1211 This function support variable length of positional input.

Parameters:

cond (Symbol) – Input graph for the loop condition.
func (Symbol) – Input graph for the loop body.
data (Symbol[]) – The input arrays that include data arrays and states.
num_outputs (int, required) – The number of outputs of the subgraph.
num_out_data (int, required) – The number of outputs from the function body.
max_iterations (int, required) – Maximum number of iterations.
cond_input_locs (tuple of <long>, required) – The locations of cond’s inputs in the given inputs.
func_input_locs (tuple of <long>, required) – The locations of func’s inputs in the given inputs.
func_var_locs (tuple of <long>, required) – The locations of loop_vars among func’s inputs.
name (string, optional.) – Name of the resulting symbol.

Returns:

The result symbol.

Return type:

Symbol

Modules

`image`	Image pre-processing operators.
`random`	Namespace for operators used in Gluon dispatched by F=symbol.