mxnet.ndarray.numpy_extension

Module for the ops not belonging to the official numpy package.

Functions

activation(data[, act_type])

Applies an activation function element-wise to the input.

add_n(*args, **kwargs)

Adds all input arguments element-wise.

arange_like(data[, start, step, repeat, ...])

Return an array with evenly spaced values.

batch_dot(a, b[, transpose_a, transpose_b, ...])

Batchwise dot product.

batch_flatten([data, out, name])

Flattens the input array into a 2-D array by collapsing the higher dimensions. .. note:: Flatten is deprecated. Use flatten instead. For an input array with shape (d1, d2, ..., dk), flatten operation reshapes the input array into an output array of shape (d1, d2*...*dk). Note that the behavior of this function is different from numpy.ndarray.flatten, which behaves similar to mxnet.ndarray.reshape((-1,)). Example::.

batch_norm(x, gamma, beta, running_mean, ...)

Batch normalization.

bipartite_matching([data, is_ascend, ...])

Compute bipartite matching.

box_decode([data, anchors, std0, std1, ...])

Decode bounding boxes training target with normalized center offsets.

box_encode([samples, matches, anchors, ...])

Encode bounding boxes training target with normalized center offsets.

box_iou([lhs, rhs, format, out, name])

Bounding box overlap of two arrays.

box_nms([data, overlap_thresh, ...])

Apply non-maximum suppression to input.

broadcast_greater([lhs, rhs, out, name])

Returns the result of element-wise greater than (>) comparison operation with broadcasting.

broadcast_like(lhs, rhs[, lhs_axes, rhs_axes])

Broadcasts lhs to have the same shape as rhs.

cast([data, dtype, out, name])

Casts all elements of the input to a new type.

cond(pred, then_func, else_func, inputs[, name])

Run an if-then-else using user-defined condition and computation

constraint_check([input, msg, out, name])

This operator will check if all the elements in a boolean tensor is true.

contrib_calibrate_entropy([hist, ...])

Provide calibrated min/max for input histogram.

contrib_quantize([data, min_range, ...])

Quantize a input tensor from float to out_type, with user-specified min_range and max_range.

contrib_quantize_v2([data, out_type, ...])

Quantize a input tensor from float to out_type, with user-specified min_calib_range and max_calib_range or the input range collected at runtime.

contrib_quantized_rnn([data, parameters, ...])

RNN operator for input data type of uint8.

convolution([data, weight, bias, kernel, ...])

Compute N-D convolution on (N+2)-D input.

ctc_loss([data, label, data_lengths, ...])

Connectionist Temporal Classification Loss.

deconvolution([data, weight, bias, kernel, ...])

Computes 1D, 2D or 3D transposed convolution (aka fractionally strided convolution) of the input tensor.

deformable_convolution([data, offset, ...])

Compute 2-D deformable convolution on 4-D input.

digamma([data, out, name])

Returns element-wise log derivative of the gamma function of the input.

dropout(data[, p, mode, axes, cudnn_off])

Applies dropout operation to input array.

embedding(data, weight[, input_dim, ...])

Maps integer indices to vector representations (embeddings).

erf([data, out, name])

Returns element-wise gauss error function of the input.

erfinv([data, out, name])

Returns element-wise inverse gauss error function of the input.

foreach(body, data, init_states[, name])

Run a for loop with user-defined computation over NDArrays on dimension 0.

fully_connected(x, weight[, bias, ...])

Applies a linear transformation: \(Y = XW^T + b\).

gamma([data, out, name])

Returns the gamma function (extension of the factorial function to the reals), computed element-wise on the input array.

gammaln([data, out, name])

Returns element-wise log of the absolute value of the gamma function of the input.

gather_nd([data, indices, out, name])

Gather elements or slices from data and store to a tensor whose shape is defined by indices.

group_norm(data, gamma, beta[, num_groups, ...])

Group normalization.

index_add([a, ind, val, out, name])

Add values to input according to given indexes.

index_update([a, ind, val, out, name])

Update values to input according to given indexes.

instance_norm([data, gamma, beta, eps, out, ...])

Applies instance normalization to the n-dimensional input array.

interleaved_matmul_encdec_qk([queries, ...])

Compute the matrix multiplication between the projections of queries and keys in multihead attention use as encoder-decoder.

interleaved_matmul_encdec_valatt([...])

Compute the matrix multiplication between the projections of values and the attention weights in multihead attention use as encoder-decoder.

interleaved_matmul_selfatt_qk([...])

Compute the matrix multiplication between the projections of queries and keys in multihead attention use as self attention.

interleaved_matmul_selfatt_valatt([...])

Compute the matrix multiplication between the projections of values and the attention weights in multihead attention use as self attention.

intgemm_fully_connected([data, weight, ...])

Multiply matrices using 8-bit integers.

intgemm_maxabsolute([data, out, name])

Compute the maximum absolute value in a tensor of float32 fast on a CPU.

intgemm_prepare_data([data, maxabs, out, name])

This operator converts quantizes float32 to int8 while also banning -128.

intgemm_prepare_weight([weight, maxabs, ...])

This operator converts a weight matrix in column-major format to intgemm's internal fast representation of weight matrices.

intgemm_take_weight([weight, indices, out, name])

Index a weight matrix stored in intgemm's weight format.

layer_norm([data, gamma, beta, axis, eps, ...])

Layer normalization.

leaky_relu([data, gamma, act_type, slope, ...])

Applies Leaky rectified linear unit activation element-wise to the input.

log_softmax(data[, axis, length, ...])

Computes the log softmax of the input.

masked_log_softmax(data, mask[, axis, ...])

Computes the masked log softmax of the input.

masked_softmax(data, mask[, axis, ...])

Applies the softmax function masking elements according to the mask provided

modulated_deformable_convolution([data, ...])

Compute 2-D modulated deformable convolution on 4-D input.

multibox_detection([cls_prob, loc_pred, ...])

Convert multibox detection predictions.

multibox_prior([data, sizes, ratios, clip, ...])

Generate prior(anchor) boxes from data, sizes and ratios.

multibox_target([anchor, label, cls_pred, ...])

Compute Multibox training targets

nonzero([x, out, name])

Return the indices of the elements that are non-zero.

norm([data, ord, axis, out_dtype, keepdims, ...])

Computes the norm on an ndarray.

one_hot(data[, depth, on_value, off_value, ...])

Returns a one-hot array.

pad([data, mode, pad_width, constant_value, ...])

Pads an input array with a constant or edge values of the array.

pick(data, index[, axis, mode, keepdims])

Picks elements from an input array according to the input indices along the given axis.

pooling([data, kernel, stride, pad, ...])

Performs pooling on the input.

quantized_act([data, min_data, max_data, ...])

Activation operator for input and output data type of int8.

quantized_conv([data, weight, bias, ...])

Convolution operator for input, weight and bias data type of int8, and accumulates in type int32 for the output.

quantized_elemwise_add([lhs, rhs, lhs_min, ...])

elemwise_add operator for input dataA and input dataB data type of int8, and accumulates in type int32 for the output.

quantized_elemwise_mul([lhs, rhs, lhs_min, ...])

Multiplies arguments int8 element-wise.

quantized_embedding([data, weight, ...])

Maps integer indices to int8 vector representations (embeddings).

quantized_flatten([data, min_data, ...])

quantized_fully_connected([data, weight, ...])

Fully Connected operator for input, weight and bias data type of int8, and accumulates in type int32 for the output.

quantized_npi_add([lhs, rhs, lhs_min, ...])

elemwise_add operator for input dataA and input dataB data type of int8, and accumulates in type int32 for the output.

quantized_pooling([data, min_data, ...])

Pooling operator for input and output data type of int8.

quantized_reshape([data, min_data, ...])

quantized_transpose([data, min_data, ...])

relu([data, out, name])

Computes rectified linear activation.

requantize([data, min_range, max_range, ...])

Given data that is quantized in int32 and the corresponding thresholds, requantize the data into int8 using min and max thresholds either calculated at runtime or from calibration.

reshape([a, newshape, reverse, order, out, name])

Gives a new shape to an array without changing its data.

reshape_like([lhs, rhs, lhs_begin, lhs_end, ...])

Reshape some or all dimensions of lhs to have the same shape as some or all dimensions of rhs.

rnn([data, parameters, state, state_cell, ...])

Applies recurrent layers to input data.

roi_pooling([data, rois, pooled_size, ...])

Performs region of interest(ROI) pooling on the input array.

round_ste([data, out, name])

Straight-through-estimator of round().

scalar_poisson([lam, shape, ctx, dtype, ...])

Draw random samples from a Poisson distribution.

sequence_last([data, sequence_length, ...])

Takes the last element of a sequence.

sequence_mask([data, sequence_length, ...])

Sets all elements outside the sequence to a constant value.

sequence_reverse([data, sequence_length, ...])

Reverses the elements of each sequence.

shape_array([data, out, name])

Returns a 1D int64 array containing the shape of data.

sigmoid([data, out, name])

Computes sigmoid of x element-wise.

sign_ste([data, out, name])

Straight-through-estimator of sign().

sldwin_atten_context([score, value, ...])

Compute the context vector for sliding window attention, used in Longformer (https://arxiv.org/pdf/2004.05150.pdf).

sldwin_atten_mask_like([score, dilation, ...])

Compute the mask for the sliding window attention score, used in Longformer (https://arxiv.org/pdf/2004.05150.pdf).

sldwin_atten_score([query, key, dilation, ...])

Compute the sliding window attention score, which is used in Longformer (https://arxiv.org/pdf/2004.05150.pdf).

slice([data, begin, end, step, out, name])

Slices a region of the array.

slice_channel([data, num_outputs, axis, ...])

Splits an array along a particular axis into multiple sub-arrays.

slice_like([data, shape_like, axes, out, name])

Slices a region of the array like the shape of another array. This function is similar to slice, however, the begin are always 0`s and `end of specific axes are inferred from the second input shape_like. Given the second shape_like input of shape=(d_0, d_1, ..., d_n-1), a slice_like operator with default empty axes, it performs the following operation: `` out = slice(input, begin=(0, 0, ..., 0), end=(d_0, d_1, ..., d_n-1))``. When axes is not empty, it is used to speficy which axes are being sliced. Given a 4-d input data, slice_like operator with axes=(0, 2, -1) will perform the following operation: `` out = slice(input, begin=(0, 0, 0, 0), end=(d_0, None, d_2, d_3))``. Note that it is allowed to have first and second input with different dimensions, however, you have to make sure the axes are specified and not exceeding the dimension limits. For example, given input_1 with shape=(2,3,4,5) and input_2 with shape=(1,2,3), it is not allowed to use: `` out = slice_like(a, b)`` because ndim of input_1 is 4, and ndim of input_2 is 3. The following is allowed in this situation: `` out = slice_like(a, b, axes=(0, 2))`` Example::.

smooth_l1([data, scalar, out, name])

Calculate Smooth L1 Loss(lhs, scalar) by summing

softmax(data[, axis, length, temperature, ...])

Applies the softmax function.

softsign([data, out, name])

Computes softsign of x element-wise.

stop_gradient([data, out, name])

Stops gradient computation.

sync_batch_norm([data, gamma, beta, ...])

Batch normalization.

tensor_poisson([lam, shape, dtype, out, name])

Concurrent sampling from multiple Poisson distributions with parameters lambda (rate).

topk(data[, axis, k, ret_typ, is_ascend, dtype])

Returns the indices of the top k elements in an input array along the given

while_loop(cond, func, loop_vars[, ...])

Run a while loop with user-defined computation and loop condition.

mxnet.ndarray.numpy_extension.activation(data, act_type='relu', **kwargs)

Applies an activation function element-wise to the input.

The following activation functions are supported:

  • log_sigmoid: \(y = log(\frac{1}{1 + exp(-x)})\)

  • mish: \(y = x * tanh(log(1 + exp(x)))\)

  • relu: Rectified Linear Unit, \(y = max(x, 0)\)

  • sigmoid: \(y = \frac{1}{1 + exp(-x)}\)

  • tanh: Hyperbolic tangent, \(y = \frac{exp(x) - exp(-x)}{exp(x) + exp(-x)}\)

  • softrelu: Soft ReLU, or SoftPlus, \(y = log(1 + exp(x))\)

  • softsign: \(y = \frac{x}{1 + abs(x)}\)

Parameters:
  • data (NDArray) – The input array.

  • act_type ({'log_sigmoid', 'mish', 'relu', 'sigmoid', 'softrelu', 'softsign', 'tanh'}, required) – Activation function to be applied.

Returns:

out – The output of this function.

Return type:

NDArray or list of NDArrays

mxnet.ndarray.numpy_extension.add_n(*args, **kwargs)

Adds all input arguments element-wise.

\[add\_n(a_1, a_2, ..., a_n) = a_1 + a_2 + ... + a_n\]

add_n is potentially more efficient than calling add by n times.

The storage type of add_n output depends on storage types of inputs

  • add_n(row_sparse, row_sparse, ..) = row_sparse

  • add_n(default, csr, default) = default

  • add_n(any input combinations longer than 4 (>4) with at least one default type) = default

  • otherwise, add_n falls all inputs back to default storage and generates default storage

Defined in /home/smola/mxnet/src/operator/tensor/elemwise_sum.cc:L158

Parameters:
  • args (ndarray[]) – Positional input arguments

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.arange_like(data, start=0.0, step=1.0, repeat=1, ctx=None, axis=None)

Return an array with evenly spaced values. If axis is not given, the output will have the same shape as the input array. Otherwise, the output will be a 1-D array with size of the specified axis in input shape.

Parameters:
  • data (NDArray) – The input

  • start (double, optional, default=0) – Start of interval. The interval includes this value. The default start value is 0.

  • step (double, optional, default=1) – Spacing between values.

  • repeat (int, optional, default='1') – The repeating time of all elements. E.g repeat=3, the element a will be repeated three times –> a, a, a.

  • ctx (string, optional, default='') – Context of output, in format [cpu|gpu|cpu_pinned](n).Only used for imperative calls.

  • axis (int or None, optional, default='None') – Arange elements according to the size of a certain axis of input array. The negative numbers are interpreted counting from the backward. If not provided, will arange elements according to the input shape.

Returns:

out – The output of this function.

Return type:

NDArray or list of NDArrays

Example

>>> x = np.random.uniform(0, 1, size=(3,4))
>>> x
array([[0.5488135 , 0.5928446 , 0.71518934, 0.84426576],
       [0.60276335, 0.8579456 , 0.5448832 , 0.8472517 ],
       [0.4236548 , 0.6235637 , 0.6458941 , 0.3843817 ]])
>>> npx.arange_like(x, start=0)
array([[ 0.,  1.,  2.,  3.],
       [ 4.,  5.,  6.,  7.],
       [ 8.,  9., 10., 11.]])
>>> npx.arange_like(x, start=0, axis=-1)
array([0., 1., 2., 3.])
mxnet.ndarray.numpy_extension.batch_dot(a, b, transpose_a=False, transpose_b=False, forward_stype='default')

Batchwise dot product.

batch_dot is used to compute dot product of x and y when x and y are data in batch, namely N-D (N >= 3) arrays in shape of (B0, …, B_i, :, :).

For example, given x with shape (B_0, …, B_i, N, M) and y with shape (B_0, …, B_i, M, K), the result array will have shape (B_0, …, B_i, N, K), which is computed by:

batch_dot(x,y)[b_0, ..., b_i, :, :] = dot(x[b_0, ..., b_i, :, :], y[b_0, ..., b_i, :, :])
Parameters:
  • lhs (NDArray) – The first input

  • rhs (NDArray) – The second input

  • transpose_a (boolean, optional, default=0) – If true then transpose the first input before dot.

  • transpose_b (boolean, optional, default=0) – If true then transpose the second input before dot.

  • forward_stype ({None, 'csr', 'default', 'row_sparse'},optional, default='None') – The desired storage type of the forward output given by user, if thecombination of input storage types and this hint does not matchany implemented ones, the dot operator will perform fallback operationand still produce an output of the desired storage type.

Returns:

out – The output of this function.

Return type:

NDArray or list of NDArrays

mxnet.ndarray.numpy_extension.batch_flatten(data=None, out=None, name=None, **kwargs)

Flattens the input array into a 2-D array by collapsing the higher dimensions. .. note:: Flatten is deprecated. Use flatten instead. For an input array with shape (d1, d2, ..., dk), flatten operation reshapes the input array into an output array of shape (d1, d2*...*dk). Note that the behavior of this function is different from numpy.ndarray.flatten, which behaves similar to mxnet.ndarray.reshape((-1,)). Example:

x = [[
    [1,2,3],
    [4,5,6],
    [7,8,9]
],
[    [1,2,3],
    [4,5,6],
    [7,8,9]
]],
flatten(x) = [[ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.],
   [ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.]]

Defined in /home/smola/mxnet/src/operator/tensor/matrix_op.cc:L278

Parameters:
  • data (ndarray) – Input array.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.batch_norm(x, gamma, beta, running_mean, running_var, eps=0.001, momentum=0.9, fix_gamma=True, use_global_stats=False, output_mean_var=False, axis=1, cudnn_off=False, min_calib_range=None, max_calib_range=None, **kwargs)

Batch normalization.

Normalizes a data batch by mean and variance, and applies a scale gamma as well as offset beta.

Assume the input has more than one dimension and we normalize along axis 1. We first compute the mean and variance along this axis:

\[\begin{split}data\_mean[i] = mean(data[:,i,:,...]) \\ data\_var[i] = var(data[:,i,:,...])\end{split}\]

Then compute the normalized output, which has the same shape as input, as following:

\[out[:,i,:,...] = \frac{data[:,i,:,...] - data\_mean[i]}{\sqrt{data\_var[i]+\epsilon}} * gamma[i] + beta[i]\]

Both mean and var returns a scalar by treating the input as a vector.

Assume the input has size k on axis 1, then both gamma and beta have shape (k,). If output_mean_var is set to be true, then outputs both data_mean and the inverse of data_var, which are needed for the backward pass. Note that gradient of these two outputs are blocked.

Besides the inputs and the outputs, this operator accepts two auxiliary states, moving_mean and moving_var, which are k-length vectors. They are global statistics for the whole dataset, which are updated by:

moving_mean = moving_mean * momentum + data_mean * (1 - momentum)
moving_var = moving_var * momentum + data_var * (1 - momentum)

If use_global_stats is set to be true, then moving_mean and moving_var are used instead of data_mean and data_var to compute the output. It is often used during inference.

The parameter axis specifies which axis of the input shape denotes the ‘channel’ (separately normalized groups). The default is 1. Specifying -1 sets the channel axis to be the last item in the input shape.

Both gamma and beta are learnable parameters. But if fix_gamma is true, then set gamma to 1 and its gradient to 0.

Note

When fix_gamma is set to True, no sparse support is provided. If fix_gamma is set to False, the sparse tensors will fallback.

Parameters:
  • data (NDArray) – Input data to batch normalization

  • gamma (NDArray) – gamma array

  • beta (NDArray) – beta array

  • moving_mean (NDArray) – running mean of input

  • moving_var (NDArray) – running variance of input

  • eps (double, optional, default=0.0010000000474974513) – Epsilon to prevent div 0. Must be no less than CUDNN_BN_MIN_EPSILON defined in cudnn.h when using cudnn (usually 1e-5)

  • momentum (float, optional, default=0.899999976) – Momentum for moving average

  • fix_gamma (boolean, optional, default=1) – Fix gamma while training

  • use_global_stats (boolean, optional, default=0) – Whether use global moving statistics instead of local batch-norm. This will force change batch-norm into a scale shift operator.

  • output_mean_var (boolean, optional, default=0) – Output the mean and inverse std

  • axis (int, optional, default='1') – Specify which shape axis the channel is specified

  • cudnn_off (boolean, optional, default=0) – Do not select CUDNN operator, if available

  • min_calib_range (float or None, optional, default=None) – The minimum scalar value in the form of float32 obtained through calibration. If present, it will be used to by quantized batch norm op to calculate primitive scale. Note: this calib_range is to calib bn output.

  • max_calib_range (float or None, optional, default=None) – The maximum scalar value in the form of float32 obtained through calibration. If present, it will be used to by quantized batch norm op to calculate primitive scale. Note: this calib_range is to calib bn output.

Returns:

out – The output of this function.

Return type:

NDArray or list of NDArrays

mxnet.ndarray.numpy_extension.bipartite_matching(data=None, is_ascend=_Null, threshold=_Null, topk=_Null, out=None, name=None, **kwargs)
Compute bipartite matching.

The matching is performed on score matrix with shape [B, N, M] - B: batch_size - N: number of rows to match - M: number of columns as reference to be matched against.

Returns: x : matched column indices. -1 indicating non-matched elements in rows. y : matched row indices.

Note:

Zero gradients are back-propagated in this op for now.

Example:

s = [[0.5, 0.6], [0.1, 0.2], [0.3, 0.4]]
x, y = bipartite_matching(x, threshold=1e-12, is_ascend=False)
x = [1, -1, 0]
y = [2, 0]

Defined in /home/smola/mxnet/src/operator/contrib/bounding_box.cc:L191

Parameters:
  • data (ndarray) – The input

  • is_ascend (boolean, optional, default=0) – Use ascend order for scores instead of descending. Please set threshold accordingly.

  • threshold (float, required) – Ignore matching when score < thresh, if is_ascend=false, or ignore score > thresh, if is_ascend=true.

  • topk (int, optional, default='-1') – Limit the number of matches to topk, set -1 for no limit

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.box_decode(data=None, anchors=None, std0=_Null, std1=_Null, std2=_Null, std3=_Null, clip=_Null, format=_Null, out=None, name=None, **kwargs)
Decode bounding boxes training target with normalized center offsets.

Input bounding boxes are using corner type: x_{min}, y_{min}, x_{max}, y_{max} or center type: x, y, width, height.) array

Defined in /home/smola/mxnet/src/operator/contrib/bounding_box.cc:L249

Parameters:
  • data (ndarray) – (B, N, 4) predicted bbox offset

  • anchors (ndarray) – (1, N, 4) encoded in corner or center

  • std0 (float, optional, default=1) – value to be divided from the 1st encoded values

  • std1 (float, optional, default=1) – value to be divided from the 2nd encoded values

  • std2 (float, optional, default=1) – value to be divided from the 3rd encoded values

  • std3 (float, optional, default=1) – value to be divided from the 4th encoded values

  • clip (float, optional, default=-1) – If larger than 0, bounding box target will be clipped to this value.

  • format ({'center', 'corner'},optional, default='center') –

    The box encoding type.

    ”corner” means boxes are encoded as [xmin, ymin, xmax, ymax], “center” means boxes are encodes as [x, y, width, height].

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.box_encode(samples=None, matches=None, anchors=None, refs=None, means=None, stds=None, out=None, name=None, **kwargs)
Encode bounding boxes training target with normalized center offsets.

Input bounding boxes are using corner type: x_{min}, y_{min}, x_{max}, y_{max}.) array

Defined in /home/smola/mxnet/src/operator/contrib/bounding_box.cc:L220

Parameters:
  • samples (ndarray) – (B, N) value +1 (positive), -1 (negative), 0 (ignore)

  • matches (ndarray) – (B, N) value range [0, M)

  • anchors (ndarray) – (B, N, 4) encoded in corner

  • refs (ndarray) – (B, M, 4) encoded in corner

  • means (ndarray) – (4,) Mean value to be subtracted from encoded values

  • stds (ndarray) – (4,) Std value to be divided from encoded values

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.box_iou(lhs=None, rhs=None, format=_Null, out=None, name=None, **kwargs)
Bounding box overlap of two arrays.

The overlap is defined as Intersection-over-Union, aka, IOU. - lhs: (a_1, a_2, …, a_n, 4) array - rhs: (b_1, b_2, …, b_n, 4) array - output: (a_1, a_2, …, a_n, b_1, b_2, …, b_n) array

Note:

Zero gradients are back-propagated in this op for now.

Example:

x = [[0.5, 0.5, 1.0, 1.0], [0.0, 0.0, 0.5, 0.5]]
y = [[0.25, 0.25, 0.75, 0.75]]
box_iou(x, y, format='corner') = [[0.1428], [0.1428]]

Defined in /home/smola/mxnet/src/operator/contrib/bounding_box.cc:L144

Parameters:
  • lhs (ndarray) – The first input

  • rhs (ndarray) – The second input

  • format ({'center', 'corner'},optional, default='corner') –

    The box encoding type.

    ”corner” means boxes are encoded as [xmin, ymin, xmax, ymax], “center” means boxes are encodes as [x, y, width, height].

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.box_nms(data=None, overlap_thresh=_Null, valid_thresh=_Null, topk=_Null, coord_start=_Null, score_index=_Null, id_index=_Null, background_id=_Null, force_suppress=_Null, in_format=_Null, out_format=_Null, out=None, name=None, **kwargs)

Apply non-maximum suppression to input.

The output will be sorted in descending order according to score. Boxes with overlaps larger than overlap_thresh, smaller scores and background boxes will be removed and filled with -1, the corresponding position will be recorded for backward propogation.

During back-propagation, the gradient will be copied to the original position according to the input index. For positions that have been suppressed, the in_grad will be assigned 0. In summary, gradients are sticked to its boxes, will either be moved or discarded according to its original index in input.

Input requirements:

1. Input tensor have at least 2 dimensions, (n, k), any higher dims will be regarded
as batch, e.g. (a, b, c, d, n, k) == (a*b*c*d, n, k)
2. n is the number of boxes in each batch
3. k is the width of each box item.

By default, a box is [id, score, xmin, ymin, xmax, ymax, …], additional elements are allowed.

  • id_index: optional, use -1 to ignore, useful if force_suppress=False, which means we will skip highly overlapped boxes if one is apple while the other is car.

  • background_id: optional, default=-1, class id for background boxes, useful when id_index >= 0 which means boxes with background id will be filtered before nms.

  • coord_start: required, default=2, the starting index of the 4 coordinates. Two formats are supported:

    • corner: [xmin, ymin, xmax, ymax]

    • center: [x, y, width, height]

  • score_index: required, default=1, box score/confidence. When two boxes overlap IOU > overlap_thresh, the one with smaller score will be suppressed.

  • in_format and out_format: default=’corner’, specify in/out box formats.

Examples:

x = [[0, 0.5, 0.1, 0.1, 0.2, 0.2], [1, 0.4, 0.1, 0.1, 0.2, 0.2],
     [0, 0.3, 0.1, 0.1, 0.14, 0.14], [2, 0.6, 0.5, 0.5, 0.7, 0.8]]
box_nms(x, overlap_thresh=0.1, coord_start=2, score_index=1, id_index=0,
    force_suppress=True, in_format='corner', out_typ='corner') =
    [[2, 0.6, 0.5, 0.5, 0.7, 0.8], [0, 0.5, 0.1, 0.1, 0.2, 0.2],
     [-1, -1, -1, -1, -1, -1], [-1, -1, -1, -1, -1, -1]]
out_grad = [[0.1, 0.1, 0.1, 0.1, 0.1, 0.1], [0.2, 0.2, 0.2, 0.2, 0.2, 0.2],
            [0.3, 0.3, 0.3, 0.3, 0.3, 0.3], [0.4, 0.4, 0.4, 0.4, 0.4, 0.4]]
# exe.backward
in_grad = [[0.2, 0.2, 0.2, 0.2, 0.2, 0.2], [0, 0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0, 0], [0.1, 0.1, 0.1, 0.1, 0.1, 0.1]]

Defined in /home/smola/mxnet/src/operator/contrib/bounding_box.cc:L93

Parameters:
  • data (ndarray) – The input

  • overlap_thresh (float, optional, default=0.5) – Overlapping(IoU) threshold to suppress object with smaller score.

  • valid_thresh (float, optional, default=0) – Filter input boxes to those whose scores greater than valid_thresh.

  • topk (int, optional, default='-1') – Apply nms to topk boxes with descending scores, -1 to no restriction.

  • coord_start (int, optional, default='2') – Start index of the consecutive 4 coordinates.

  • score_index (int, optional, default='1') – Index of the scores/confidence of boxes.

  • id_index (int, optional, default='-1') – Optional, index of the class categories, -1 to disable.

  • background_id (int, optional, default='-1') – Optional, id of the background class which will be ignored in nms.

  • force_suppress (boolean, optional, default=0) – Optional, if set false and id_index is provided, nms will only apply to boxes belongs to the same category

  • in_format ({'center', 'corner'},optional, default='corner') –

    The input box encoding type.

    ”corner” means boxes are encoded as [xmin, ymin, xmax, ymax], “center” means boxes are encodes as [x, y, width, height].

  • out_format ({'center', 'corner'},optional, default='corner') –

    The output box encoding type.

    ”corner” means boxes are encoded as [xmin, ymin, xmax, ymax], “center” means boxes are encodes as [x, y, width, height].

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.broadcast_greater(lhs=None, rhs=None, out=None, name=None, **kwargs)

Returns the result of element-wise greater than (>) comparison operation with broadcasting.

Example:

x = [[ 1.,  1.,  1.],
     [ 1.,  1.,  1.]]

y = [[ 0.],
     [ 1.]]

broadcast_greater(x, y) = [[ 1.,  1.,  1.],
                           [ 0.,  0.,  0.]]

Defined in /home/smola/mxnet/src/operator/tensor/elemwise_binary_broadcast_op_logic.cc:L84

Parameters:
  • lhs (ndarray) – First input to the function

  • rhs (ndarray) – Second input to the function

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.broadcast_like(lhs, rhs, lhs_axes=None, rhs_axes=None)

Broadcasts lhs to have the same shape as rhs.

Broadcasting is a mechanism that allows NDArrays to perform arithmetic operations with arrays of different shapes efficiently without creating multiple copies of arrays. Also see, Broadcasting for more explanation.

Broadcasting is allowed on axes with size 1, such as from (2,1,3,1) to (2,8,3,9). Elements will be duplicated on the broadcasted axes.

Parameters:
  • lhs (NDArray) – First input.

  • rhs (NDArray) – Second input.

  • lhs_axes (Shape or None, optional, default=None) – Axes to perform broadcast on in the first input array

  • rhs_axes (Shape or None, optional, default=None) – Axes to copy from the second input array

Returns:

out – The output of this function.

Return type:

NDArray or list of NDArrays

Example

>>> a = np.array([[1,2,3]])
>>> b = np.array([[5,6,7],[7,8,9]])
>>> npx.broadcast_like(a, b)
array([[1., 2., 3.],
       [1., 2., 3.]])
>>> a = np.array([9])
>>> b = np.array([1,2,3,4,5])
>>> npx.broadcast_like(a, b, lhs_axes=(0,), rhs_axes=(-1,))
array([9., 9., 9., 9., 9.])
mxnet.ndarray.numpy_extension.cast(data=None, dtype=_Null, out=None, name=None, **kwargs)

Casts all elements of the input to a new type.

Note

Cast is deprecated. Use cast instead.

Example:

cast([0.9, 1.3], dtype='int32') = [0, 1]
cast([1e20, 11.1], dtype='float16') = [inf, 11.09375]
cast([300, 11.1, 10.9, -1, -3], dtype='uint8') = [44, 11, 10, 255, 253]

Defined in /home/smola/mxnet/src/operator/tensor/elemwise_unary_op_basic.cc:L789

Parameters:
  • data (ndarray) – The input.

  • dtype ({'bfloat16', 'bool', 'float16', 'float32', 'float64', 'int16', 'int32', 'int64', 'int8', 'uint16', 'uint32', 'uint64', 'uint8'}, required) – Output data type.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.cond(pred, then_func, else_func, inputs, name='cond')

Run an if-then-else using user-defined condition and computation

This operator simulates a if-like branch which chooses to do one of the two customized computations according to the specified condition.

pred is a scalar MXNet NDArray, indicating which branch of computation should be used.

then_func is a user-defined function, used as computation of the then branch. It produces outputs, which is a list of NDArrays. The signature of then_func should be then_func() => NDArray or nested List[NDArray].

else_func is a user-defined function, used as computation of the else branch. It produces outputs, which is a list of NDArrays. The signature of else_func should be else_func() => NDArray or nested List[NDArray].

The outputs produces by then_func and else_func should have the same number of elements, all of which should be in the same shape, of the same dtype and stype.

This function returns a list of symbols, representing the computation result.

Parameters:
  • pred (a Python function.) – The branch condition.

  • then_func (a Python function.) – The computation to be executed if pred is true.

  • else_func (a Python function.) – The computation to be executed if pred is false.

Returns:

outputs

Return type:

an NDArray or nested lists of NDArrays, representing the result of computation.

Examples

>>> a, b = mx.np.array([1]), mx.np.array([2])
>>> pred = a * b < 5
>>> then_func = lambda: (a + 5) * (b + 5)
>>> else_func = lambda: (a - 5) * (b - 5)
>>> outputs = mx.npx.cond(pred, then_func, else_func)
>>> outputs[0]
42.0
mxnet.ndarray.numpy_extension.constraint_check(input=None, msg=_Null, out=None, name=None, **kwargs)

This operator will check if all the elements in a boolean tensor is true. If not, ValueError exception will be raised in the backend with given error message. In order to evaluate this operator, one should multiply the origin tensor by the return value of this operator to force this operator become part of the computation graph, otherwise the check would not be working under symoblic mode.

Parameters:
  • x (ndarray) – A boolean tensor.

  • msg (string) – The error message in the exception.

Returns:

out – If all the elements in the input tensor are true, array(True) will be returned, otherwise ValueError exception would be raised before anything got returned.

Return type:

ndarray

Examples

>>> loc = np.zeros((2,2))
>>> scale = np.array(#some_value)
>>> constraint = (scale > 0)
>>> np.random.normal(loc,
                 scale * npx.constraint_check(constraint, 'Scale should be larger than zero'))

If elements in the scale tensor are all bigger than zero, npx.constraint_check would return np.array(True), which will not change the value of scale when multiplied by. If some of the elements in the scale tensor violate the constraint, i.e. there exists False in the boolean tensor constraint, a ValueError exception with given message ‘Scale should be larger than zero’ would be raised.

mxnet.ndarray.numpy_extension.contrib_calibrate_entropy(hist=None, hist_edges=None, num_quantized_bins=_Null, out=None, name=None, **kwargs)

Provide calibrated min/max for input histogram.

Note

This operator only supports forward propagation. DO NOT use it in training.

Defined in /home/smola/mxnet/src/operator/quantization/calibrate.cc:L207

Parameters:
  • hist (ndarray) – A ndarray/symbol of type float32

  • hist_edges (ndarray) – A ndarray/symbol of type float32

  • num_quantized_bins (int, optional, default='255') – The number of quantized bins.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.contrib_quantize(data=None, min_range=None, max_range=None, out_type=_Null, out=None, name=None, **kwargs)

Quantize a input tensor from float to out_type, with user-specified min_range and max_range.

min_range and max_range are scalar floats that specify the range for the input data.

When out_type is uint8, the output is calculated using the following equation:

out[i] = (in[i] - min_range) * range(OUTPUT_TYPE) / (max_range - min_range) + 0.5,

where range(T) = numeric_limits<T>::max() - numeric_limits<T>::min().

When out_type is int8, the output is calculate using the following equation by keep zero centered for the quantized value:

out[i] = sign(in[i]) * min(abs(in[i] * scale + 0.5f, quantized_range),

where quantized_range = MinAbs(max(int8), min(int8)) and scale = quantized_range / MaxAbs(min_range, max_range).

Note

This operator only supports forward propagation. DO NOT use it in training.

Defined in /home/smola/mxnet/src/operator/quantization/quantize.cc:L94

Parameters:
  • data (ndarray) – A ndarray/symbol of type float32

  • min_range (ndarray) – The minimum scalar value possibly produced for the input

  • max_range (ndarray) – The maximum scalar value possibly produced for the input

  • out_type ({'int8', 'uint8'},optional, default='uint8') – Output data type.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.contrib_quantize_v2(data=None, out_type=_Null, min_calib_range=_Null, max_calib_range=_Null, out=None, name=None, **kwargs)

Quantize a input tensor from float to out_type, with user-specified min_calib_range and max_calib_range or the input range collected at runtime.

Output min_range and max_range are scalar floats that specify the range for the input data.

When out_type is uint8, the output is calculated using the following equation:

out[i] = (in[i] - min_range) * range(OUTPUT_TYPE) / (max_range - min_range) + 0.5,

where range(T) = numeric_limits<T>::max() - numeric_limits<T>::min().

When out_type is int8, the output is calculate using the following equation by keep zero centered for the quantized value:

out[i] = sign(in[i]) * min(abs(in[i] * scale + 0.5f, quantized_range),

where quantized_range = MinAbs(max(int8), min(int8)) and scale = quantized_range / MaxAbs(min_range, max_range).

When out_type is auto, the output type is automatically determined by min_calib_range if presented. If min_calib_range < 0.0f, the output type will be int8, otherwise will be uint8. If min_calib_range isn’t presented, the output type will be int8.

Note

This operator only supports forward propagation. DO NOT use it in training.

Defined in /home/smola/mxnet/src/operator/quantization/quantize_v2.cc:L104

Parameters:
  • data (ndarray) – A ndarray/symbol of type float32

  • out_type ({'auto', 'int8', 'uint8'},optional, default='int8') – Output data type. auto can be specified to automatically determine output type according to min_calib_range.

  • min_calib_range (float or None, optional, default=None) – The minimum scalar value in the form of float32. If present, it will be used to quantize the fp32 data into int8 or uint8.

  • max_calib_range (float or None, optional, default=None) – The maximum scalar value in the form of float32. If present, it will be used to quantize the fp32 data into int8 or uint8.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.contrib_quantized_rnn(data=None, parameters=None, state=None, state_cell=None, data_scale=None, data_shift=None, state_size=_Null, num_layers=_Null, bidirectional=_Null, mode=_Null, p=_Null, state_outputs=_Null, projection_size=_Null, lstm_state_clip_min=_Null, lstm_state_clip_max=_Null, lstm_state_clip_nan=_Null, use_sequence_length=_Null, out=None, name=None, **kwargs)

RNN operator for input data type of uint8. The weight of each gates is converted to int8, while bias is accumulated in type float32. The hidden state and cell state are in type float32. For the input data, two more arguments of type float32 must be provided representing the thresholds of quantizing argument from data type float32 to uint8. The final outputs contain the recurrent result in float32. It only supports quantization for Vanilla LSTM network.

Note

This operator only supports forward propagation. DO NOT use it in training.

Defined in /home/smola/mxnet/src/operator/quantization/quantized_rnn.cc:L320

Parameters:
  • data (ndarray) – Input data.

  • parameters (ndarray) – weight.

  • state (ndarray) – initial hidden state of the RNN

  • state_cell (ndarray) – initial cell state for LSTM networks (only for LSTM)

  • data_scale (ndarray) – quantization scale of data.

  • data_shift (ndarray) – quantization shift of data.

  • state_size (int (non-negative), required) – size of the state for each layer

  • num_layers (int (non-negative), required) – number of stacked layers

  • bidirectional (boolean, optional, default=0) – whether to use bidirectional recurrent layers

  • mode ({'gru', 'lstm', 'rnn_relu', 'rnn_tanh'}, required) – the type of RNN to compute

  • p (float, optional, default=0) – drop rate of the dropout on the outputs of each RNN layer, except the last layer.

  • state_outputs (boolean, optional, default=0) – Whether to have the states as symbol outputs.

  • projection_size (int or None, optional, default='None') – size of project size

  • lstm_state_clip_min (double or None, optional, default=None) – Minimum clip value of LSTM states. This option must be used together with lstm_state_clip_max.

  • lstm_state_clip_max (double or None, optional, default=None) – Maximum clip value of LSTM states. This option must be used together with lstm_state_clip_min.

  • lstm_state_clip_nan (boolean, optional, default=0) – Whether to stop NaN from propagating in state by clipping it to min/max. If clipping range is not specified, this option is ignored.

  • use_sequence_length (boolean, optional, default=0) – If set to true, this layer takes in an extra input parameter sequence_length to specify variable length sequence

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.convolution(data=None, weight=None, bias=None, kernel=None, stride=None, dilate=None, pad=None, num_filter=1, num_group=1, workspace=1024, no_bias=False, cudnn_tune=None, cudnn_off=False, layout=None)

Compute N-D convolution on (N+2)-D input.

In the 2-D convolution, given input data with shape (batch_size, channel, height, width), the output is computed by

\[out[n,i,:,:] = bias[i] + \sum_{j=0}^{channel} data[n,j,:,:] \star weight[i,j,:,:]\]

where \(\star\) is the 2-D cross-correlation operator.

For general 2-D convolution, the shapes are

  • data: (batch_size, channel, height, width)

  • weight: (num_filter, channel, kernel[0], kernel[1])

  • bias: (num_filter,)

  • out: (batch_size, num_filter, out_height, out_width).

Define:

f(x,k,p,s,d) = floor((x+2*p-d*(k-1)-1)/s)+1

then we have:

out_height=f(height, kernel[0], pad[0], stride[0], dilate[0])
out_width=f(width, kernel[1], pad[1], stride[1], dilate[1])

If no_bias is set to be true, then the bias term is ignored.

The default data layout is NCHW, namely (batch_size, channel, height, width). We can choose other layouts such as NWC.

If num_group is larger than 1, denoted by g, then split the input data evenly into g parts along the channel axis, and also evenly split weight along the first dimension. Next compute the convolution on the i-th part of the data with the i-th weight part. The output is obtained by concatenating all the g results.

1-D convolution does not have height dimension but only width in space.

  • data: (batch_size, channel, width)

  • weight: (num_filter, channel, kernel[0])

  • bias: (num_filter,)

  • out: (batch_size, num_filter, out_width).

3-D convolution adds an additional depth dimension besides height and width. The shapes are

  • data: (batch_size, channel, depth, height, width)

  • weight: (num_filter, channel, kernel[0], kernel[1], kernel[2])

  • bias: (num_filter,)

  • out: (batch_size, num_filter, out_depth, out_height, out_width).

Both weight and bias are learnable parameters.

There are other options to tune the performance.

  • cudnn_tune: enable this option leads to higher startup time but may give faster speed. Options are

    • off: no tuning

    • limited_workspace:run test and pick the fastest algorithm that doesn’t exceed workspace limit.

    • fastest: pick the fastest algorithm and ignore workspace limit.

    • None (default): the behavior is determined by environment variable MXNET_CUDNN_AUTOTUNE_DEFAULT. 0 for off, 1 for limited workspace (default), 2 for fastest.

  • workspace: A large number leads to more (GPU) memory usage but may improve the performance.

Parameters:
  • data (NDArray) – Input data to the ConvolutionOp.

  • weight (NDArray) – Weight matrix.

  • bias (NDArray) – Bias parameter.

  • kernel (Shape(tuple), required) – Convolution kernel size: (w,), (h, w) or (d, h, w)

  • stride (Shape(tuple), optional, default=[]) – Convolution stride: (w,), (h, w) or (d, h, w). Defaults to 1 for each dimension.

  • dilate (Shape(tuple), optional, default=[]) – Convolution dilate: (w,), (h, w) or (d, h, w). Defaults to 1 for each dimension.

  • pad (Shape(tuple), optional, default=[]) – Zero pad for convolution: (w,), (h, w) or (d, h, w). Defaults to no padding.

  • num_filter (int (non-negative), required) – Convolution filter(channel) number

  • num_group (int (non-negative), optional, default=1) – Number of group partitions.

  • workspace (long (non-negative), optional, default=1024) – Maximum temporary workspace allowed (MB) in convolution.This parameter has two usages. When CUDNN is not used, it determines the effective batch size of the convolution kernel. When CUDNN is used, it controls the maximum temporary storage used for tuning the best CUDNN kernel when limited_workspace strategy is used.

  • no_bias (boolean, optional, default=0) – Whether to disable bias parameter.

  • cudnn_tune ({None, 'fastest', 'limited_workspace', 'off'},optional, default='None') – Whether to pick convolution algo by running performance test.

  • cudnn_off (boolean, optional, default=0) – Turn off cudnn for this layer.

  • layout ({None, 'NCDHW', 'NCHW', 'NCW', 'NDHWC', 'NHWC'},optional, default='None') – Set layout for input, output and weight. Empty for default layout: NCW for 1d, NCHW for 2d and NCDHW for 3d. NHWC and NDHWC are only supported on GPU.

Returns:

out – The output of this function.

Return type:

NDArray or list of NDArrays

mxnet.ndarray.numpy_extension.ctc_loss(data=None, label=None, data_lengths=None, label_lengths=None, use_data_lengths=_Null, use_label_lengths=_Null, blank_label=_Null, out=None, name=None, **kwargs)

Connectionist Temporal Classification Loss.

Note

The existing alias contrib_CTCLoss is deprecated.

The shapes of the inputs and outputs:

  • data: (sequence_length, batch_size, alphabet_size)

  • label: (batch_size, label_sequence_length)

  • out: (batch_size)

The data tensor consists of sequences of activation vectors (without applying softmax), with i-th channel in the last dimension corresponding to i-th label for i between 0 and alphabet_size-1 (i.e always 0-indexed). Alphabet size should include one additional value reserved for blank label. When blank_label is "first", the 0-th channel is be reserved for activation of blank label, or otherwise if it is “last”, (alphabet_size-1)-th channel should be reserved for blank label.

label is an index matrix of integers. When blank_label is "first", the value 0 is then reserved for blank label, and should not be passed in this matrix. Otherwise, when blank_label is "last", the value (alphabet_size-1) is reserved for blank label.

If a sequence of labels is shorter than label_sequence_length, use the special padding value at the end of the sequence to conform it to the correct length. The padding value is 0 when blank_label is "first", and -1 otherwise.

For example, suppose the vocabulary is [a, b, c], and in one batch we have three sequences ‘ba’, ‘cbb’, and ‘abac’. When blank_label is "first", we can index the labels as {‘a’: 1, ‘b’: 2, ‘c’: 3}, and we reserve the 0-th channel for blank label in data tensor. The resulting label tensor should be padded to be:

[[2, 1, 0, 0], [3, 2, 2, 0], [1, 2, 1, 3]]

When blank_label is "last", we can index the labels as {‘a’: 0, ‘b’: 1, ‘c’: 2}, and we reserve the channel index 3 for blank label in data tensor. The resulting label tensor should be padded to be:

[[1, 0, -1, -1], [2, 1, 1, -1], [0, 1, 0, 2]]

out is a list of CTC loss values, one per example in the batch.

See Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks, A. Graves et al. for more information on the definition and the algorithm.

Defined in /home/smola/mxnet/src/operator/nn/ctc_loss.cc:L104

Parameters:
  • data (ndarray) – Input ndarray

  • label (ndarray) – Ground-truth labels for the loss.

  • data_lengths (ndarray) – Lengths of data for each of the samples. Only required when use_data_lengths is true.

  • label_lengths (ndarray) – Lengths of labels for each of the samples. Only required when use_label_lengths is true.

  • use_data_lengths (boolean, optional, default=0) – Whether the data lenghts are decided by data_lengths. If false, the lengths are equal to the max sequence length.

  • use_label_lengths (boolean, optional, default=0) – Whether the label lenghts are decided by label_lengths, or derived from padding_mask. If false, the lengths are derived from the first occurrence of the value of padding_mask. The value of padding_mask is 0 when first CTC label is reserved for blank, and -1 when last label is reserved for blank. See blank_label.

  • blank_label ({'first', 'last'},optional, default='first') – Set the label that is reserved for blank label.If “first”, 0-th label is reserved, and label values for tokens in the vocabulary are between 1 and alphabet_size-1, and the padding mask is -1. If “last”, last label value alphabet_size-1 is reserved for blank label instead, and label values for tokens in the vocabulary are between 0 and alphabet_size-2, and the padding mask is 0.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.deconvolution(data=None, weight=None, bias=None, kernel=None, stride=None, dilate=None, pad=None, adj=None, target_shape=None, num_filter=1, num_group=1, workspace=1024, no_bias=False, cudnn_tune=None, cudnn_off=False, layout=None)

Computes 1D, 2D or 3D transposed convolution (aka fractionally strided convolution) of the input tensor. This operation can be seen as the gradient of Convolution operation with respect to its input. Convolution usually reduces the size of the input. Transposed convolution works the other way, going from a smaller input to a larger output while preserving the connectivity pattern.

Parameters:
  • data (NDArray) – Input tensor to the deconvolution operation.

  • weight (NDArray) – Weights representing the kernel.

  • bias (NDArray) – Bias added to the result after the deconvolution operation.

  • kernel (Shape(tuple), required) – Deconvolution kernel size: (w,), (h, w) or (d, h, w). This is same as the kernel size used for the corresponding convolution

  • stride (Shape(tuple), optional, default=[]) – The stride used for the corresponding convolution: (w,), (h, w) or (d, h, w). Defaults to 1 for each dimension.

  • dilate (Shape(tuple), optional, default=[]) – Dilation factor for each dimension of the input: (w,), (h, w) or (d, h, w). Defaults to 1 for each dimension.

  • pad (Shape(tuple), optional, default=[]) – The amount of implicit zero padding added during convolution for each dimension of the input: (w,), (h, w) or (d, h, w). (kernel-1)/2 is usually a good choice. If target_shape is set, pad will be ignored and a padding that will generate the target shape will be used. Defaults to no padding.

  • adj (Shape(tuple), optional, default=[]) – Adjustment for output shape: (w,), (h, w) or (d, h, w). If target_shape is set, adj will be ignored and computed accordingly.

  • target_shape (Shape(tuple), optional, default=[]) – Shape of the output tensor: (w,), (h, w) or (d, h, w).

  • num_filter (int (non-negative), required) – Number of output filters.

  • num_group (int (non-negative), optional, default=1) – Number of groups partition.

  • workspace (long (non-negative), optional, default=512) – Maximum temporary workspace allowed (MB) in deconvolution. This parameter has two usages. When CUDNN is not used, it determines the effective batch size of the deconvolution kernel. When CUDNN is used, it controls the maximum temporary storage used for tuning the best CUDNN kernel when limited_workspace strategy is used.

  • no_bias (boolean, optional, default=1) – Whether to disable bias parameter.

  • cudnn_tune ({None, 'fastest', 'limited_workspace', 'off'},optional, default='None') – Whether to pick convolution algorithm by running performance test.

  • cudnn_off (boolean, optional, default=0) – Turn off cudnn for this layer.

  • layout ({None, 'NCDHW', 'NCHW', 'NCW', 'NDHWC', 'NHWC'},optional, default='None') – Set layout for input, output and weight. Empty for default layout, NCW for 1d, NCHW for 2d and NCDHW for 3d. NHWC and NDHWC are only supported on GPU.

  • out (NDArray, optional) – The output NDArray to hold the result.

Returns:

out – The output of this function.

Return type:

NDArray or list of NDArrays

mxnet.ndarray.numpy_extension.deformable_convolution(data=None, offset=None, weight=None, bias=None, kernel=_Null, stride=_Null, dilate=_Null, pad=_Null, num_filter=_Null, num_group=_Null, num_deformable_group=_Null, workspace=_Null, no_bias=_Null, layout=_Null, out=None, name=None, **kwargs)

Compute 2-D deformable convolution on 4-D input.

The deformable convolution operation is described in https://arxiv.org/abs/1703.06211

For 2-D deformable convolution, the shapes are

  • data: (batch_size, channel, height, width)

  • offset: (batch_size, num_deformable_group * kernel[0] * kernel[1] * 2, height, width)

  • weight: (num_filter, channel, kernel[0], kernel[1])

  • bias: (num_filter,)

  • out: (batch_size, num_filter, out_height, out_width).

Define:

f(x,k,p,s,d) = floor((x+2*p-d*(k-1)-1)/s)+1

then we have:

out_height=f(height, kernel[0], pad[0], stride[0], dilate[0])
out_width=f(width, kernel[1], pad[1], stride[1], dilate[1])

If no_bias is set to be true, then the bias term is ignored.

The default data layout is NCHW, namely (batch_size, channle, height, width).

If num_group is larger than 1, denoted by g, then split the input data evenly into g parts along the channel axis, and also evenly split weight along the first dimension. Next compute the convolution on the i-th part of the data with the i-th weight part. The output is obtained by concating all the g results.

If num_deformable_group is larger than 1, denoted by dg, then split the input offset evenly into dg parts along the channel axis, and also evenly split data into dg parts along the channel axis. Next compute the deformable convolution, apply the i-th part of the offset on the i-th part of the data.

Both weight and bias are learnable parameters.

Defined in /home/smola/mxnet/src/operator/deformable_convolution.cc:L80

Parameters:
  • data (ndarray) – Input data to the DeformableConvolutionOp.

  • offset (ndarray) – Input offset to the DeformableConvolutionOp.

  • weight (ndarray) – Weight matrix.

  • bias (ndarray) – Bias parameter.

  • kernel (Shape(tuple), required) – Convolution kernel size: (h, w) or (d, h, w)

  • stride (Shape(tuple), optional, default=[]) – Convolution stride: (h, w) or (d, h, w). Defaults to 1 for each dimension.

  • dilate (Shape(tuple), optional, default=[]) – Convolution dilate: (h, w) or (d, h, w). Defaults to 1 for each dimension.

  • pad (Shape(tuple), optional, default=[]) – Zero pad for convolution: (h, w) or (d, h, w). Defaults to no padding.

  • num_filter (long, required) – Convolution filter(channel) number

  • num_group (long, optional, default=1) – Number of group partitions.

  • num_deformable_group (long, optional, default=1) – Number of deformable group partitions.

  • workspace (long (non-negative), optional, default=1024) – Maximum temperal workspace allowed for convolution (MB).

  • no_bias (boolean, optional, default=0) – Whether to disable bias parameter.

  • layout ({None, 'NCDHW', 'NCHW', 'NCW'},optional, default='None') –

    Set layout for input, output and weight. Empty for

    default layout: NCW for 1d, NCHW for 2d and NCDHW for 3d.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.digamma(data=None, out=None, name=None, **kwargs)

Returns element-wise log derivative of the gamma function of the input.

The storage type of digamma output is always dense

Parameters:
  • data (ndarray) – The input array.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.dropout(data, p=0.5, mode='training', axes=None, cudnn_off=False, **kwargs)

Applies dropout operation to input array.

  • During training, each element of the input is set to zero with probability p. The whole array is rescaled by \(1/(1-p)\) to keep the expected sum of the input unchanged.

  • During testing, this operator does not change the input if mode is ‘training’. If mode is ‘always’, the same computaion as during training will be applied.

Parameters:
  • data (NDArray) – Input array to which dropout will be applied.

  • p (float, optional, default=0.5) – Fraction of the input that gets dropped out during training time.

  • mode ({'always', 'training'},optional, default='training') – Whether to only turn on dropout during training or to also turn on for inference.

  • axes (Shape(tuple), optional, default=[]) – Axes for variational dropout kernel.

  • cudnn_off (boolean or None, optional, default=0) – Whether to turn off cudnn in dropout operator. This option is ignored if axes is specified.

Returns:

out – The output of this function.

Return type:

NDArray or list of NDArrays

mxnet.ndarray.numpy_extension.embedding(data, weight, input_dim=None, output_dim=None, dtype='float32', sparse_grad=False, **kwargs)

Maps integer indices to vector representations (embeddings).

This operator maps words to real-valued vectors in a high-dimensional space, called word embeddings. These embeddings can capture semantic and syntactic properties of the words. For example, it has been noted that in the learned embedding spaces, similar words tend to be close to each other and dissimilar words far apart.

For an input array of shape (d1, …, dK), the shape of an output array is (d1, …, dK, output_dim). All the input values should be integers in the range [0, input_dim).

If the input_dim is ip0 and output_dim is op0, then shape of the embedding weight matrix must be (ip0, op0).

When “sparse_grad” is False, if any index mentioned is too large, it is replaced by the index that addresses the last vector in an embedding matrix. When “sparse_grad” is True, an error will be raised if invalid indices are found.

The storage type of weight can be either row_sparse or default.

Note

If “sparse_grad” is set to True, the storage type of gradient w.r.t weights will be “row_sparse”. Only a subset of optimizers support sparse gradients, including SGD, AdaGrad and Adam. Note that by default lazy updates is turned on, which may perform differently from standard updates. For more details, please check the Optimization API at: https://mxnet.apache.org/versions/master/api/python/docs/api/optimizer/index.html

Parameters:
  • data (NDArray) – The input array to the embedding operator.

  • weight (NDArray) – The embedding weight matrix.

  • input_dim (long, required) – Vocabulary size of the input indices.

  • output_dim (long, required) – Dimension of the embedding vectors.

  • dtype ({'bfloat16', 'float16', 'float32', 'float64', 'int32', 'int64', 'int8', 'uint8'},) – optional, default=’float32’ Data type of weight.

  • sparse_grad (boolean, optional, default=0) – Compute row sparse gradient in the backward calculation. If set to True, the grad’s storage type is row_sparse.

Returns:

out – The output of this function.

Return type:

NDArray or list of NDArrays

Example

>>> input_dim = 4
>>> output_dim = 5

Each row in weight matrix y represents a word. So, y = (w0,w1,w2,w3)

>>> y = np.arange(input_dim * output_dim).reshape(input_dim, output_dim)
>>> y
array([[ 0.,  1.,  2.,  3.,  4.],
       [ 5.,  6.,  7.,  8.,  9.],
       [10., 11., 12., 13., 14.],
       [15., 16., 17., 18., 19.]])

Input array x represents n-grams(2-gram). So, x = [(w1,w3), (w0,w2)]

>>> x = np.array([[1., 3.], [0., 2.]])
>>> x
array([[1., 3.],
       [0., 2.]])

Mapped input x to its vector representation y.

>>> npx.embedding(x, y, input_dim, output_dim)
array([[[ 5.,  6.,  7.,  8.,  9.],
        [15., 16., 17., 18., 19.]],
[[ 0., 1., 2., 3., 4.],

[10., 11., 12., 13., 14.]]])

mxnet.ndarray.numpy_extension.erf(data=None, out=None, name=None, **kwargs)

Returns element-wise gauss error function of the input.

Example:

erf([0, -1., 10.]) = [0., -0.8427, 1.]

Defined in /home/smola/mxnet/src/operator/tensor/elemwise_unary_op_basic.cc:L1015

Parameters:
  • data (ndarray) – The input array.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.erfinv(data=None, out=None, name=None, **kwargs)

Returns element-wise inverse gauss error function of the input.

Example:

erfinv([0, 0.5., -1.]) = [0., 0.4769, -inf]

Defined in /home/smola/mxnet/src/operator/tensor/elemwise_unary_op_basic.cc:L1036

Parameters:
  • data (ndarray) – The input array.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.foreach(body, data, init_states, name='foreach')

Run a for loop with user-defined computation over NDArrays on dimension 0.

This operator simulates a for loop and body has the computation for an iteration of the for loop. It runs the computation in body on each slice from the input NDArrays.

body takes two arguments as input and outputs a tuple of two elements, as illustrated below:

out, states = body(data1, states)

data1 can be either an NDArray or a list of NDArrays. If data is an NDArray, data1 is an NDArray. Otherwise, data1 is a list of NDArrays and has the same size as data. states is a list of NDArrays and have the same size as init_states. Similarly, out can be either an NDArray or a list of NDArrays, which are concatenated as the first output of foreach; states from the last execution of body are the second output of foreach.

The computation done by this operator is equivalent to the pseudo code below when the input data is NDArray:

states = init_states
outs = []
for i in data.shape[0]:
    s = data[i]
    out, states = body(s, states)
    outs.append(out)
outs = stack(*outs)
Parameters:
  • body (HybridBlock.) – Define computation in an iteration.

  • data (an NDArray or a list of NDArrays.) – The input data.

  • init_states (an NDArray or nested lists of NDArrays.) – The initial values of the loop states.

Returns:

  • outputs (an NDArray or nested lists of NDArrays.) – The output data concatenated from the output of all iterations.

  • states (an NDArray or nested lists of NDArrays.) – The loop states in the last iteration.

Examples

>>> step = lambda data, states: (data + states[0], [states[0] * 2])
>>> data = mx.np.random.uniform(size=(2, 10))
>>> states = [mx.np.random.uniform(size=(10))]
>>> outs, states = npx.control_flow.foreach(step, data, states)
mxnet.ndarray.numpy_extension.fully_connected(x, weight, bias=None, num_hidden=None, no_bias=True, flatten=True, **kwargs)

Applies a linear transformation: \(Y = XW^T + b\).

If flatten is set to be true, then the shapes are:

  • data: (batch_size, x1, x2, …, xn)

  • weight: (num_hidden, x1 * x2 * … * xn)

  • bias: (num_hidden,)

  • out: (batch_size, num_hidden)

If flatten is set to be false, then the shapes are:

  • data: (x1, x2, …, xn, input_dim)

  • weight: (num_hidden, input_dim)

  • bias: (num_hidden,)

  • out: (x1, x2, …, xn, num_hidden)

The learnable parameters include both weight and bias.

If no_bias is set to be true, then the bias term is ignored.

Note

The sparse support for FullyConnected is limited to forward evaluation with row_sparse weight and bias, where the length of weight.indices and bias.indices must be equal to num_hidden. This could be useful for model inference with row_sparse weights trained with importance sampling or noise contrastive estimation.

To compute linear transformation with ‘csr’ sparse data, sparse.dot is recommended instead of sparse.FullyConnected.

Parameters:
  • data (NDArray) – Input data.

  • weight (NDArray) – Weight matrix.

  • bias (NDArray) – Bias parameter.

  • num_hidden (int, required) – Number of hidden nodes of the output.

  • no_bias (boolean, optional, default=0) – Whether to disable bias parameter.

  • flatten (boolean, optional, default=1) – Whether to collapse all but the first axis of the input data tensor.

Returns:

out – The output of this function.

Return type:

NDArray or list of NDArrays

mxnet.ndarray.numpy_extension.gamma(data=None, out=None, name=None, **kwargs)

Returns the gamma function (extension of the factorial function to the reals), computed element-wise on the input array.

The storage type of gamma output is always dense

Parameters:
  • data (ndarray) – The input array.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.gammaln(data=None, out=None, name=None, **kwargs)

Returns element-wise log of the absolute value of the gamma function of the input.

The storage type of gammaln output is always dense

Parameters:
  • data (ndarray) – The input array.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.gather_nd(data=None, indices=None, out=None, name=None, **kwargs)

Gather elements or slices from data and store to a tensor whose shape is defined by indices.

Given data with shape (X_0, X_1, …, X_{N-1}) and indices with shape (M, Y_0, …, Y_{K-1}), the output will have shape (Y_0, …, Y_{K-1}, X_M, …, X_{N-1}), where M <= N. If M == N, output shape will simply be (Y_0, …, Y_{K-1}).

The elements in output is defined as follows:

output[y_0, ..., y_{K-1}, x_M, ..., x_{N-1}] = data[indices[0, y_0, ..., y_{K-1}],
                                                    ...,
                                                    indices[M-1, y_0, ..., y_{K-1}],
                                                    x_M, ..., x_{N-1}]

Examples:

data = [[0, 1], [2, 3]]
indices = [[1, 1, 0], [0, 1, 0]]
gather_nd(data, indices) = [2, 3, 0]

data = [[[1, 2], [3, 4]], [[5, 6], [7, 8]]]
indices = [[0, 1], [1, 0]]
gather_nd(data, indices) = [[3, 4], [5, 6]]
Parameters:
  • data (ndarray) – data

  • indices (ndarray) – indices

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.group_norm(data, gamma, beta, num_groups=1, eps=0.001, output_mean_var=False)

Group normalization.

The input channels are separated into num_groups groups, each containing num_channels / num_groups channels. The mean and standard-deviation are calculated separately over the each group.

\[data = data.reshape((N, num_groups, C // num_groups, ...)) out = \frac{data - mean(data, axis)}{\sqrt{var(data, axis) + \epsilon}} * gamma + beta\]

Both gamma and beta are learnable parameters.

Defined in ../src/operator/nn/group_norm.cc:L78

Parameters:
  • data (NDArray) – Input data

  • gamma (NDArray) – gamma array

  • beta (NDArray) – beta array

  • num_groups (int, optional, default='1') – Total number of groups.

  • eps (float, optional, default=9.99999975e-06) – An epsilon parameter to prevent division by 0.

  • output_mean_var (boolean, optional, default=0) – Output the mean and std calculated along the given axis.

Returns:

out – The output of this function.

Return type:

NDArray or list of NDArrays

mxnet.ndarray.numpy_extension.index_add(a=None, ind=None, val=None, out=None, name=None, **kwargs)

Add values to input according to given indexes. If exists repeate positions to be updated, the update value will be accumulated.

Parameters:
  • a (ndarray) – Input data. The array to be updated.

  • ind (ndarray) –

    Indexes for indicating update positions. For example, array([[0, 1], [2, 3], [4, 5]] indicates here are two positions to be updated, which is (0, 2, 4) and (1, 3, 5). Note: - ‘ind’ cannot be empty array ‘[]’, for that case, please use operator ‘add’ instead.

    • 0 <= ind.ndim <= 2.

    • ind.dtype should be ‘int32’ or ‘int64’

  • val (ndarray) – Input data. The array to update the input ‘a’.

Returns:

out – The output array.

Return type:

ndarray

Examples

>>> a = np.zeros((2, 3, 4))
>>> ind = np.array([[0, 0], [0, 0], [0, 1]], dtype='int32')
>>> val = np.arange(2).reshape(2) + 1
>>> b = npx.index_add(a, ind, val)
>>> b
array([[[1., 2., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],
[[0., 0., 0., 0.],

[0., 0., 0., 0.], [0., 0., 0., 0.]]])

>>> ind = np.array([[0, 0], [0, 0], [0, 0]], dtype='int32')  # accumulate values in repeated positions
>>> b = npx.index_add(a, ind, val)
>>> b
array([[[3., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],
[[0., 0., 0., 0.],

[0., 0., 0., 0.], [0., 0., 0., 0.]]])

>>> ind=np.array([[0, 0], [0, 1]], dtype='int32')
>>> val = np.arange(8).reshape(2, 4)
>>> b = npx.index_add(a, ind, val)
>>> b
array([[[0., 1., 2., 3.],
        [4., 5., 6., 7.],
        [0., 0., 0., 0.]],
[[0., 0., 0., 0.],

[0., 0., 0., 0.], [0., 0., 0., 0.]]])

>>> val = np.arange(4).reshape(4)  # brocast 'val'
>>> b = npx.index_add(a, ind, val)
>>> b
array([[[0., 1., 2., 3.],
        [0., 1., 2., 3.],
        [0., 0., 0., 0.]],
[[0., 0., 0., 0.],

[0., 0., 0., 0.], [0., 0., 0., 0.]]])

mxnet.ndarray.numpy_extension.index_update(a=None, ind=None, val=None, out=None, name=None, **kwargs)

Update values to input according to given indexes. If multiple indices refer to the same location it is undefined which update is chosen; it may choose the order of updates arbitrarily and nondeterministically (e.g., due to concurrent updates on some hardware platforms). Recommend not to use repeate positions.

Parameters:
  • a (ndarray) – Input data. The array to be updated. Support dtype: ‘float32’, ‘float64’, ‘int32’, ‘int64’.

  • ind (ndarray) –

    Indexes for indicating update positions. For example, array([[0, 1], [2, 3], [4, 5]] indicates here are two positions to be updated, which is (0, 2, 4) and (1, 3, 5). Note: - ‘ind’ cannot be empty array ‘[]’, for that case, please use operator ‘add’ instead.

    • 0 <= ind.ndim <= 2.

    • ind.dtype should be ‘int32’ or ‘int64’

  • val (ndarray) – Input data. The array to update the input ‘a’. Support dtype: ‘float32’, ‘float64’, ‘int32’, ‘int64’.

Returns:

out – The output array.

Return type:

ndarray

Examples

>>> a = np.zeros((2, 3, 4))
>>> ind = np.array([[0, 0], [0, 0], [0, 1]], dtype='int32')
>>> val = np.arange(2).reshape(2) + 1
>>> b = npx.index_update(a, ind, val)
>>> b
array([[[1., 2., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],
[[0., 0., 0., 0.],

[0., 0., 0., 0.], [0., 0., 0., 0.]]])

>>> ind=np.array([[0, 0], [0, 1]], dtype='int32')
>>> val = np.arange(8).reshape(2, 4)
>>> b = npx.index_update(a, ind, val)
>>> b
array([[[0., 1., 2., 3.],
        [4., 5., 6., 7.],
        [0., 0., 0., 0.]],
[[0., 0., 0., 0.],

[0., 0., 0., 0.], [0., 0., 0., 0.]]])

>>> val = np.arange(4).reshape(4)  # brocast 'val'
>>> b = npx.index_update(a, ind, val)
>>> b
array([[[0., 1., 2., 3.],
        [0., 1., 2., 3.],
        [0., 0., 0., 0.]],
[[0., 0., 0., 0.],

[0., 0., 0., 0.], [0., 0., 0., 0.]]])

mxnet.ndarray.numpy_extension.instance_norm(data=None, gamma=None, beta=None, eps=_Null, out=None, name=None, **kwargs)

Applies instance normalization to the n-dimensional input array.

This operator takes an n-dimensional input array where (n>2) and normalizes the input using the following formula:

\[out = \frac{x - mean[data]}{ \sqrt{Var[data] + \epsilon}} * gamma + beta\]

This layer is similar to batch normalization layer (BatchNorm) with two differences: first, the normalization is carried out per example (instance), not over a batch. Second, the same normalization is applied both at test and train time. This operation is also known as contrast normalization.

If the input data is of shape [batch, channel, spacial_dim1, spacial_dim2, …], gamma and beta parameters must be vectors of shape [channel].

This implementation is based on this paper [1]_

Examples:

// Input of shape (2,1,2)
x = [[[ 1.1,  2.2]],
     [[ 3.3,  4.4]]]

// gamma parameter of length 1
gamma = [1.5]

// beta parameter of length 1
beta = [0.5]

// Instance normalization is calculated with the above formula
InstanceNorm(x,gamma,beta) = [[[-0.997527  ,  1.99752665]],
                              [[-0.99752653,  1.99752724]]]

Defined in /home/smola/mxnet/src/operator/instance_norm.cc:L94

Parameters:
  • data (ndarray) – An n-dimensional input array (n > 2) of the form [batch, channel, spatial_dim1, spatial_dim2, …].

  • gamma (ndarray) – A vector of length ‘channel’, which multiplies the normalized input.

  • beta (ndarray) – A vector of length ‘channel’, which is added to the product of the normalized input and the weight.

  • eps (float, optional, default=0.00100000005) – An epsilon parameter to prevent division by 0.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.interleaved_matmul_encdec_qk(queries=None, keys_values=None, heads=_Null, out=None, name=None, **kwargs)

Compute the matrix multiplication between the projections of queries and keys in multihead attention use as encoder-decoder.

the inputs must be a tensor of projections of queries following the layout: (seq_length, batch_size, num_heads * head_dim)

and a tensor of interleaved projections of values and keys following the layout: (seq_length, batch_size, num_heads * head_dim * 2)

the equivalent code would be:

q_proj = mx.nd.transpose(queries, axes=(1, 2, 0, 3))
q_proj = mx.nd.reshape(q_proj, shape=(-1, 0, 0), reverse=True)
q_proj = mx.nd.contrib.div_sqrt_dim(q_proj)
tmp = mx.nd.reshape(keys_values, shape=(0, 0, num_heads, 2, -1))
k_proj = mx.nd.transpose(tmp[:,:,:,0,:], axes=(1, 2, 0, 3))
k_proj = mx.nd.reshap(k_proj, shape=(-1, 0, 0), reverse=True)
output = mx.nd.batch_dot(q_proj, k_proj, transpose_b=True)

Defined in /home/smola/mxnet/src/operator/contrib/transformer.cc:L797

Parameters:
  • queries (ndarray) – Queries

  • keys_values (ndarray) – Keys and values interleaved

  • heads (int, required) – Set number of heads

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.interleaved_matmul_encdec_valatt(keys_values=None, attention=None, heads=_Null, out=None, name=None, **kwargs)

Compute the matrix multiplication between the projections of values and the attention weights in multihead attention use as encoder-decoder.

the inputs must be a tensor of interleaved projections of keys and values following the layout: (seq_length, batch_size, num_heads * head_dim * 2)

and the attention weights following the layout: (batch_size, seq_length, seq_length)

the equivalent code would be:

tmp = mx.nd.reshape(queries_keys_values, shape=(0, 0, num_heads, 3, -1))
v_proj = mx.nd.transpose(tmp[:,:,:,1,:], axes=(1, 2, 0, 3))
v_proj = mx.nd.reshape(v_proj, shape=(-1, 0, 0), reverse=True)
output = mx.nd.batch_dot(attention, v_proj, transpose_b=True)
output = mx.nd.reshape(output, shape=(-1, num_heads, 0, 0), reverse=True)
output = mx.nd.transpose(output, axes=(0, 2, 1, 3))
output = mx.nd.reshape(output, shape=(0, 0, -1))

Defined in /home/smola/mxnet/src/operator/contrib/transformer.cc:L847

Parameters:
  • keys_values (ndarray) – Keys and values interleaved

  • attention (ndarray) – Attention maps

  • heads (int, required) – Set number of heads

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.interleaved_matmul_selfatt_qk(queries_keys_values=None, heads=_Null, out=None, name=None, **kwargs)

Compute the matrix multiplication between the projections of queries and keys in multihead attention use as self attention.

the input must be a single tensor of interleaved projections of queries, keys and values following the layout: (seq_length, batch_size, num_heads * head_dim * 3)

the equivalent code would be:

tmp = mx.nd.reshape(queries_keys_values, shape=(0, 0, num_heads, 3, -1))
q_proj = mx.nd.transpose(tmp[:,:,:,0,:], axes=(1, 2, 0, 3))
q_proj = mx.nd.reshape(q_proj, shape=(-1, 0, 0), reverse=True)
q_proj = mx.nd.contrib.div_sqrt_dim(q_proj)
k_proj = mx.nd.transpose(tmp[:,:,:,1,:], axes=(1, 2, 0, 3))
k_proj = mx.nd.reshape(k_proj, shape=(-1, 0, 0), reverse=True)
output = mx.nd.batch_dot(q_proj, k_proj, transpose_b=True)

Defined in /home/smola/mxnet/src/operator/contrib/transformer.cc:L694

Parameters:
  • queries_keys_values (ndarray) – Interleaved queries, keys and values

  • heads (int, required) – Set number of heads

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.interleaved_matmul_selfatt_valatt(queries_keys_values=None, attention=None, heads=_Null, out=None, name=None, **kwargs)

Compute the matrix multiplication between the projections of values and the attention weights in multihead attention use as self attention.

the inputs must be a tensor of interleaved projections of queries, keys and values following the layout: (seq_length, batch_size, num_heads * head_dim * 3)

and the attention weights following the layout: (batch_size, seq_length, seq_length)

the equivalent code would be:

tmp = mx.nd.reshape(queries_keys_values, shape=(0, 0, num_heads, 3, -1))
v_proj = mx.nd.transpose(tmp[:,:,:,2,:], axes=(1, 2, 0, 3))
v_proj = mx.nd.reshape(v_proj, shape=(-1, 0, 0), reverse=True)
output = mx.nd.batch_dot(attention, v_proj)
output = mx.nd.reshape(output, shape=(-1, num_heads, 0, 0), reverse=True)
output = mx.nd.transpose(output, axes=(2, 0, 1, 3))
output = mx.nd.reshape(output, shape=(0, 0, -1))

Defined in /home/smola/mxnet/src/operator/contrib/transformer.cc:L745

Parameters:
  • queries_keys_values (ndarray) – Queries, keys and values interleaved

  • attention (ndarray) – Attention maps

  • heads (int, required) – Set number of heads

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.intgemm_fully_connected(data=None, weight=None, scaling=None, bias=None, num_hidden=_Null, no_bias=_Null, flatten=_Null, out_type=_Null, out=None, name=None, **kwargs)

Multiply matrices using 8-bit integers. data * weight.

Input tensor arguments are: data weight [scaling] [bias]

data: either float32 or prepared using intgemm_prepare_data (in which case it is int8).

weight: must be prepared using intgemm_prepare_weight.

scaling: present if and only if out_type is float32. If so this is multiplied by the result before adding bias. Typically: scaling = (max passed to intgemm_prepare_weight)/127.0 if data is in float32 scaling = (max_passed to intgemm_prepare_data)/127.0 * (max passed to intgemm_prepare_weight)/127.0 if data is in int8

bias: present if and only if !no_bias. This is added to the output after scaling and has the same number of columns as the output.

out_type: type of the output.

Defined in /home/smola/mxnet/src/operator/contrib/intgemm/intgemm_fully_connected_op.cc:L284

Parameters:
  • data (ndarray) – First argument to multiplication. Tensor of float32 (quantized on the fly) or int8 from intgemm_prepare_data. If you use a different quantizer, be sure to ban -128. The last dimension must be a multiple of 64.

  • weight (ndarray) – Second argument to multiplication. Tensor of int8 from intgemm_prepare_weight. The last dimension must be a multiple of 64. The product of non-last dimensions must be a multiple of 8.

  • scaling (ndarray) – Scaling factor to apply if output type is float32.

  • bias (ndarray) – Bias term.

  • num_hidden (int, required) – Number of hidden nodes of the output.

  • no_bias (boolean, optional, default=0) – Whether to disable bias parameter.

  • flatten (boolean, optional, default=1) – Whether to collapse all but the first axis of the input data tensor.

  • out_type ({'float32', 'int32'},optional, default='float32') – Output data type.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.intgemm_maxabsolute(data=None, out=None, name=None, **kwargs)

Compute the maximum absolute value in a tensor of float32 fast on a CPU. The tensor’s total size must be a multiple of 16 and aligned to a multiple of 64 bytes. mxnet.nd.contrib.intgemm_maxabsolute(arr) == arr.abs().max()

Defined in /home/smola/mxnet/src/operator/contrib/intgemm/max_absolute_op.cc:L102

Parameters:
  • data (ndarray) – Tensor to compute maximum absolute value of

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.intgemm_prepare_data(data=None, maxabs=None, out=None, name=None, **kwargs)

This operator converts quantizes float32 to int8 while also banning -128.

It it suitable for preparing an data matrix for use by intgemm’s C=data * weights operation.

The float32 values are scaled such that maxabs maps to 127. Typically maxabs = maxabsolute(A).

Defined in /home/smola/mxnet/src/operator/contrib/intgemm/prepare_data_op.cc:L112

Parameters:
  • data (ndarray) – Activation matrix to be prepared for multiplication.

  • maxabs (ndarray) – Maximum absolute value to be used for scaling. (The values will be multiplied by 127.0 / maxabs.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.intgemm_prepare_weight(weight=None, maxabs=None, already_quantized=_Null, out=None, name=None, **kwargs)

This operator converts a weight matrix in column-major format to intgemm’s internal fast representation of weight matrices. MXNet customarily stores weight matrices in column-major (transposed) format. This operator is not meant to be fast; it is meant to be run offline to quantize a model.

In other words, it prepares weight for the operation C = data * weight^T.

If the provided weight matrix is float32, it will be quantized first. The quantization function is (int8_t)(127.0 / max * weight) where multiplier is provided as argument 1 (the weight matrix is argument 0). Then the matrix will be rearranged into the CPU-dependent format.

If the provided weight matrix is already int8, the matrix will only be rearranged into the CPU-dependent format. This way one can quantize with intgemm_prepare_data (which just quantizes), store to disk in a consistent format, then at load time convert to CPU-dependent format with intgemm_prepare_weight.

The internal representation depends on register length. So AVX512, AVX2, and SSSE3 have different formats. AVX512BW and AVX512VNNI have the same representation.

Defined in /home/smola/mxnet/src/operator/contrib/intgemm/prepare_weight_op.cc:L152

Parameters:
  • weight (ndarray) – Parameter matrix to be prepared for multiplication.

  • maxabs (ndarray) – Maximum absolute value for scaling. The weights will be multipled by 127.0 / maxabs.

  • already_quantized (boolean, optional, default=0) – Is the weight matrix already quantized?

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.intgemm_take_weight(weight=None, indices=None, out=None, name=None, **kwargs)

Index a weight matrix stored in intgemm’s weight format. The indices select the outputs of matrix multiplication, not the inner dot product dimension.

Defined in /home/smola/mxnet/src/operator/contrib/intgemm/take_weight_op.cc:L125

Parameters:
  • weight (ndarray) – Tensor already in intgemm weight format to select from

  • indices (ndarray) – indices to select on the 0th dimension of weight

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.layer_norm(data=None, gamma=None, beta=None, axis=None, eps=None, output_mean_var=None)

Layer normalization.

Normalizes the channels of the input tensor by mean and variance, and applies a scale gamma as well as offset beta.

Assume the input has more than one dimension and we normalize along axis 1. We first compute the mean and variance along this axis and then compute the normalized output, which has the same shape as input, as following:

\[out = \frac{data - mean(data, axis)}{\sqrt{var(data, axis) + \epsilon}} * gamma + beta\]

Both gamma and beta are learnable parameters.

Unlike BatchNorm and InstanceNorm, the mean and var are computed along the channel dimension.

Assume the input has size k on axis 1, then both gamma and beta have shape (k,). If output_mean_var is set to be true, then outputs both data_mean and data_std. Note that no gradient will be passed through these two outputs.

The parameter axis specifies which axis of the input shape denotes the ‘channel’ (separately normalized groups). The default is -1, which sets the channel axis to be the last item in the input shape.

Parameters:
  • data (NDArray) – Input data to layer normalization

  • gamma (NDArray) – gamma array

  • beta (NDArray) – beta array

  • axis (int, optional, default='-1') – The axis to perform layer normalization. Usually, this should be be axis of the channel dimension. Negative values means indexing from right to left.

  • eps (float, optional, default=9.99999975e-06) – An epsilon parameter to prevent division by 0.

  • output_mean_var (boolean, optional, default=0) – Output the mean and std calculated along the given axis.

Returns:

out – The output of this function.

Return type:

NDArray or list of NDArrays

mxnet.ndarray.numpy_extension.leaky_relu(data=None, gamma=None, act_type='leaky', slope=0.25, lower_bound=0.125, upper_bound=0.334, **kwargs)

Applies Leaky rectified linear unit activation element-wise to the input.

Leaky ReLUs attempt to fix the “dying ReLU” problem by allowing a small slope when the input is negative and has a slope of one when input is positive.

The following modified ReLU Activation functions are supported:

  • elu: Exponential Linear Unit. y = x > 0 ? x : slope * (exp(x)-1)

  • gelu: Gaussian Error Linear Unit. y = 0.5 * x * (1 + erf(x / sqrt(2)))

  • selu: Scaled Exponential Linear Unit. y = lambda * (x > 0 ? x : alpha * (exp(x) - 1)) where lambda = 1.0507009873554804934193349852946 and alpha = 1.6732632423543772848170429916717.

  • leaky: Leaky ReLU. y = x > 0 ? x : slope * x

  • prelu: Parametric ReLU. This is same as leaky except that slope is learnt during training.

  • rrelu: Randomized ReLU. same as leaky but the slope is uniformly and randomly chosen from [lower_bound, upper_bound) for training, while fixed to be (lower_bound+upper_bound)/2 for inference.

Parameters:
  • data (NDArray) – Input data to activation function.

  • gamma (NDArray) – Input data to activation function.

  • act_type ({'elu', 'gelu', 'leaky', 'prelu', 'rrelu', 'selu'},optional, default='leaky') – Activation function to be applied.

  • slope (float, optional, default=0.25) – Init slope for the activation. (For leaky and elu only)

  • lower_bound (float, optional, default=0.125) – Lower bound of random slope. (For rrelu only)

  • upper_bound (float, optional, default=0.333999991) – Upper bound of random slope. (For rrelu only)

Returns:

out – The output of this function.

Return type:

NDArray or list of NDArrays

mxnet.ndarray.numpy_extension.log_softmax(data, axis=-1, length=None, temperature=None, use_length=False, dtype=None)

Computes the log softmax of the input. This is equivalent to computing softmax followed by log.

Parameters:
  • data (NDArray) – The input array.

  • axis (int, optional, default='-1') – The axis along which to compute softmax.

  • length (NDArray) – The length array.

  • temperature (double or None, optional, default=None) – Temperature parameter in softmax

  • dtype ({None, 'float16', 'float32', 'float64'},optional, default='None') – DType of the output in case this can’t be inferred. Defaults to the same as input’s dtype if not defined (dtype=None).

  • use_length (boolean or None, optional, default=0) – Whether to use the length input as a mask over the data input.

Returns:

out – The output of this function.

Return type:

NDArray or list of NDArrays

Examples

>>> data = np.array([1, 2, .1])
>>> npx.log_softmax(data)
array([-1.4170278, -0.4170278, -2.3170278])
>>> data = np.array([[1, 2, .1],[.1, 2, 1]])
>>> npx.log_softmax(data, axis=0)
array([[-0.34115386, -0.6931472 , -1.2411538 ],
    [-1.2411538 , -0.6931472 , -0.34115386]])
mxnet.ndarray.numpy_extension.masked_log_softmax(data, mask, axis=-1, temperature=1.0, normalize=True)

Computes the masked log softmax of the input. This is equivalent to computing masked softmax followed by log.

Parameters:
  • data (NDArray) – The input array.

  • mask (NDArray) – Mask to apply.

  • axis (int, optional, default='-1') – The axis along which to compute softmax.

  • temperature (double or None, optional, default=None) – Temperature parameter in softmax

  • normalize (boolean or None, optional, default=1) – Whether to normalize input data x: x = x - max(x)

Returns:

out – The output of this function.

Return type:

NDArray or list of NDArrays

Examples

>>> data = np.arange(5)
>>> mask = np.array([1, 0, 1, 0, 1])
>>> npx.masked_log_softmax(data, mask)
array([-4.1429286 ,        -inf, -2.1429286 ,        -inf, -0.14292854])
>>> data = np.arange(10).reshape((2, 5))
>>> npx.masked_log_softmax(data, mask, axis=0)
array([[-5.0067153 ,        -inf, -5.0067153 ,        -inf, -5.0067153 ],
       [-0.00671535,        -inf, -0.00671535,        -inf, -0.00671535]])
mxnet.ndarray.numpy_extension.masked_softmax(data, mask, axis=-1, temperature=1.0, normalize=True)

Applies the softmax function masking elements according to the mask provided

Parameters:
  • data (NDArray) – The input array.

  • mask (NDArray) – Mask to apply.

  • axis (int, optional, default='-1') – The axis along which to compute softmax.

  • temperature (double or None, optional, default=None) – Temperature parameter in softmax

  • normalize (boolean or None, optional, default=1) – Whether to normalize input data x: x = x - max(x)

Returns:

out – The output of this function.

Return type:

NDArray or list of NDArrays

Examples

>>> data = np.arange(5)
>>> mask = np.array([1, 0, 1, 0, 1])
>>> npx.masked_softmax(data, mask)
array([0.01587624, 0.        , 0.11731042, 0.        , 0.8668133 ])
>>> data = np.arange(10).reshape((2, 5))
>>> npx.masked_softmax(data, mask, axis=0)
array([[0.00669285, 0.        , 0.00669285, 0.        , 0.00669285],
       [0.9933072 , 0.        , 0.9933072 , 0.        , 0.9933072 ]])
mxnet.ndarray.numpy_extension.modulated_deformable_convolution(data=None, offset=None, mask=None, weight=None, bias=None, kernel=_Null, stride=_Null, dilate=_Null, pad=_Null, num_filter=_Null, num_group=_Null, num_deformable_group=_Null, workspace=_Null, no_bias=_Null, im2col_step=_Null, layout=_Null, out=None, name=None, **kwargs)

Compute 2-D modulated deformable convolution on 4-D input.

The modulated deformable convolution operation is described in https://arxiv.org/abs/1811.11168

For 2-D modulated deformable convolution, the shapes are

  • data: (batch_size, channel, height, width)

  • offset: (batch_size, num_deformable_group * kernel[0] * kernel[1] * 2, height, width)

  • mask: (batch_size, num_deformable_group * kernel[0] * kernel[1], height, width)

  • weight: (num_filter, channel, kernel[0], kernel[1])

  • bias: (num_filter,)

  • out: (batch_size, num_filter, out_height, out_width).

Define:

f(x,k,p,s,d) = floor((x+2*p-d*(k-1)-1)/s)+1

then we have:

out_height=f(height, kernel[0], pad[0], stride[0], dilate[0])
out_width=f(width, kernel[1], pad[1], stride[1], dilate[1])

If no_bias is set to be true, then the bias term is ignored.

The default data layout is NCHW, namely (batch_size, channle, height, width).

If num_group is larger than 1, denoted by g, then split the input data evenly into g parts along the channel axis, and also evenly split weight along the first dimension. Next compute the convolution on the i-th part of the data with the i-th weight part. The output is obtained by concating all the g results.

If num_deformable_group is larger than 1, denoted by dg, then split the input offset evenly into dg parts along the channel axis, and also evenly split out evenly into dg parts along the channel axis. Next compute the deformable convolution, apply the i-th part of the offset part on the i-th out.

Both weight and bias are learnable parameters.

Defined in /home/smola/mxnet/src/operator/modulated_deformable_convolution.cc:L83

Parameters:
  • data (ndarray) – Input data to the ModulatedDeformableConvolutionOp.

  • offset (ndarray) – Input offset to ModulatedDeformableConvolutionOp.

  • mask (ndarray) – Input mask to the ModulatedDeformableConvolutionOp.

  • weight (ndarray) – Weight matrix.

  • bias (ndarray) – Bias parameter.

  • kernel (Shape(tuple), required) – Convolution kernel size: (h, w) or (d, h, w)

  • stride (Shape(tuple), optional, default=[]) – Convolution stride: (h, w) or (d, h, w). Defaults to 1 for each dimension.

  • dilate (Shape(tuple), optional, default=[]) – Convolution dilate: (h, w) or (d, h, w). Defaults to 1 for each dimension.

  • pad (Shape(tuple), optional, default=[]) – Zero pad for convolution: (h, w) or (d, h, w). Defaults to no padding.

  • num_filter (int (non-negative), required) – Convolution filter(channel) number

  • num_group (int (non-negative), optional, default=1) – Number of group partitions.

  • num_deformable_group (int (non-negative), optional, default=1) – Number of deformable group partitions.

  • workspace (long (non-negative), optional, default=1024) – Maximum temperal workspace allowed for convolution (MB).

  • no_bias (boolean, optional, default=0) – Whether to disable bias parameter.

  • im2col_step (int (non-negative), optional, default=64) – Maximum number of images per im2col computation; The total batch size should be divisable by this value or smaller than this value; if you face out of memory problem, you can try to use a smaller value here.

  • layout ({None, 'NCDHW', 'NCHW', 'NCW'},optional, default='None') –

    Set layout for input, output and weight. Empty for

    default layout: NCW for 1d, NCHW for 2d and NCDHW for 3d.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.multibox_detection(cls_prob=None, loc_pred=None, anchor=None, clip=_Null, threshold=_Null, background_id=_Null, nms_threshold=_Null, force_suppress=_Null, variances=_Null, nms_topk=_Null, out=None, name=None, **kwargs)

Convert multibox detection predictions.

Parameters:
  • cls_prob (ndarray) – Class probabilities.

  • loc_pred (ndarray) – Location regression predictions.

  • anchor (ndarray) – Multibox prior anchor boxes

  • clip (boolean, optional, default=1) – Clip out-of-boundary boxes.

  • threshold (float, optional, default=0.00999999978) – Threshold to be a positive prediction.

  • background_id (int, optional, default='0') – Background id.

  • nms_threshold (float, optional, default=0.5) – Non-maximum suppression threshold.

  • force_suppress (boolean, optional, default=0) – Suppress all detections regardless of class_id.

  • variances (tuple of <float>, optional, default=[0.1,0.1,0.2,0.2]) – Variances to be decoded from box regression output.

  • nms_topk (int, optional, default='-1') – Keep maximum top k detections before nms, -1 for no limit.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.multibox_prior(data=None, sizes=_Null, ratios=_Null, clip=_Null, steps=_Null, offsets=_Null, out=None, name=None, **kwargs)

Generate prior(anchor) boxes from data, sizes and ratios.

Parameters:
  • data (ndarray) – Input data.

  • sizes (tuple of <float>, optional, default=[1]) – List of sizes of generated MultiBoxPriores.

  • ratios (tuple of <float>, optional, default=[1]) – List of aspect ratios of generated MultiBoxPriores.

  • clip (boolean, optional, default=0) – Whether to clip out-of-boundary boxes.

  • steps (tuple of <float>, optional, default=[-1,-1]) – Priorbox step across y and x, -1 for auto calculation.

  • offsets (tuple of <float>, optional, default=[0.5,0.5]) – Priorbox center offsets, y and x respectively

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.multibox_target(anchor=None, label=None, cls_pred=None, overlap_threshold=_Null, ignore_label=_Null, negative_mining_ratio=_Null, negative_mining_thresh=_Null, minimum_negative_samples=_Null, variances=_Null, out=None, name=None, **kwargs)

Compute Multibox training targets

Parameters:
  • anchor (ndarray) – Generated anchor boxes.

  • label (ndarray) – Object detection labels.

  • cls_pred (ndarray) – Class predictions.

  • overlap_threshold (float, optional, default=0.5) – Anchor-GT overlap threshold to be regarded as a positive match.

  • ignore_label (float, optional, default=-1) – Label for ignored anchors.

  • negative_mining_ratio (float, optional, default=-1) – Max negative to positive samples ratio, use -1 to disable mining

  • negative_mining_thresh (float, optional, default=0.5) – Threshold used for negative mining.

  • minimum_negative_samples (int, optional, default='0') – Minimum number of negative samples.

  • variances (tuple of <float>, optional, default=[0.1,0.1,0.2,0.2]) – Variances to be encoded in box regression target.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.nonzero(x=None, out=None, name=None, **kwargs)

Return the indices of the elements that are non-zero.

Returns a ndarray with ndim is 2. Each row contains the indices of the non-zero elements. The values in a are always tested and returned in row-major, C-style order.

The result of this is always a 2-D array, with a row for each non-zero element.

Parameters:

a (array_like) – Input array.

Returns:

array – Indices of elements that are non-zero.

Return type:

ndarray

Notes

This function differs from the original numpy.nonzero in the following aspects:
  • Does not support python numeric.

  • The return value is same as numpy.transpose(numpy.nonzero(a)).

Examples

>>> x = np.array([[3, 0, 0], [0, 4, 0], [5, 6, 0]])
>>> x
array([[3, 0, 0],
       [0, 4, 0],
       [5, 6, 0]])
>>> npx.nonzero(x)
array([[0, 0],
       [1, 1],
       [2, 0],
       [2, 1]], dtype=int64)
>>> np.transpose(npx.nonzero(x))
array([[0, 1, 2, 2],
       [0, 1, 0, 1]], dtype=int64)
mxnet.ndarray.numpy_extension.norm(data=None, ord=_Null, axis=_Null, out_dtype=_Null, keepdims=_Null, out=None, name=None, **kwargs)

Computes the norm on an ndarray.

This operator computes the norm on an ndarray with the specified axis, depending on the value of the ord parameter. By default, it computes the L2 norm on the entire array. Currently only ord=2 supports sparse ndarrays.

Examples:

x = [[[1, 2],
      [3, 4]],
     [[2, 2],
      [5, 6]]]

norm(x, ord=2, axis=1) = [[3.1622777 4.472136 ]
                          [5.3851647 6.3245554]]

norm(x, ord=1, axis=1) = [[4., 6.],
                          [7., 8.]]

rsp = x.cast_storage('row_sparse')

norm(rsp) = [5.47722578]

csr = x.cast_storage('csr')

norm(csr) = [5.47722578]

Defined in /home/smola/mxnet/src/operator/tensor/broadcast_reduce_norm_value.cc:L88

Parameters:
  • data (ndarray) – The input

  • ord (int, optional, default='2') – Order of the norm. Currently ord=1 and ord=2 is supported.

  • axis (Shape or None, optional, default=None) –

    The axis or axes along which to perform the reduction.

    The default, axis=(), will compute over all elements into a scalar array with shape (1,). If axis is int, a reduction is performed on a particular axis. If axis is a 2-tuple, it specifies the axes that hold 2-D matrices, and the matrix norms of these matrices are computed.

  • out_dtype ({None, 'float16', 'float32', 'float64', 'int32', 'int64', 'int8'},optional, default='None') – The data type of the output.

  • keepdims (boolean, optional, default=0) – If this is set to True, the reduced axis is left in the result as dimension with size one.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.one_hot(data, depth=None, on_value=1.0, off_value=0.0, dtype='float32')

Returns a one-hot array.

The locations represented by indices take value on_value, while all other locations take value off_value.

one_hot operation with indices of shape (i0, i1) and depth of d would result in an output array of shape (i0, i1, d) with:

output[i,j,:] = off_value
output[i,j,indices[i,j]] = on_value
Parameters:
  • indices (NDArray) – array of locations where to set on_value

  • depth (long, required) – Depth of the one hot dimension.

  • on_value (double, optional, default=1) – The value assigned to the locations represented by indices.

  • off_value (double, optional, default=0) – The value assigned to the locations not represented by indices.

  • dtype ({'bfloat16', 'float16', 'float32', 'float64', 'int32', 'int64', 'int8', 'uint8'},) – optional, default=’float32’ DType of the output

Returns:

out – The output of this function.

Return type:

NDArray or list of NDArrays

Example

>>> data = np.array([1,0,2,0])
>>> npx.one_hot(data, 3)
array([[0., 1., 0.],
       [1., 0., 0.],
       [0., 0., 1.],
       [1., 0., 0.]], dtype=float64)
>>> npx.one_hot(data, 3, on_value=8, off_value=1, dtype='int32')
array([[1, 8, 1],
       [8, 1, 1],
       [1, 1, 8],
       [8, 1, 1]], dtype=int32)
>>> data = np.array([[1,0],[1,0],[2,0]])
>>> npx.one_hot(data, 3)
array([[[0., 1., 0.],
        [1., 0., 0.]],
       [[0., 1., 0.],
        [1., 0., 0.]],
       [[0., 0., 1.],
        [1., 0., 0.]]], dtype=float64)
mxnet.ndarray.numpy_extension.pad(data=None, mode=_Null, pad_width=_Null, constant_value=_Null, out=None, name=None, **kwargs)

Pads an input array with a constant or edge values of the array.

Note

Pad is deprecated. Use pad instead.

Note

Current implementation only supports 4D and 5D input arrays with padding applied only on axes 1, 2 and 3. Expects axes 4 and 5 in pad_width to be zero.

This operation pads an input array with either a constant_value or edge values along each axis of the input array. The amount of padding is specified by pad_width.

pad_width is a tuple of integer padding widths for each axis of the format (before_1, after_1, ... , before_N, after_N). The pad_width should be of length 2*N where N is the number of dimensions of the array.

For dimension N of the input array, before_N and after_N indicates how many values to add before and after the elements of the array along dimension N. The widths of the higher two dimensions before_1, after_1, before_2, after_2 must be 0.

Example:

x = [[[[  1.   2.   3.]
       [  4.   5.   6.]]

      [[  7.   8.   9.]
       [ 10.  11.  12.]]]


     [[[ 11.  12.  13.]
       [ 14.  15.  16.]]

      [[ 17.  18.  19.]
       [ 20.  21.  22.]]]]

pad(x,mode="edge", pad_width=(0,0,0,0,1,1,1,1)) =

      [[[[  1.   1.   2.   3.   3.]
         [  1.   1.   2.   3.   3.]
         [  4.   4.   5.   6.   6.]
         [  4.   4.   5.   6.   6.]]

        [[  7.   7.   8.   9.   9.]
         [  7.   7.   8.   9.   9.]
         [ 10.  10.  11.  12.  12.]
         [ 10.  10.  11.  12.  12.]]]


       [[[ 11.  11.  12.  13.  13.]
         [ 11.  11.  12.  13.  13.]
         [ 14.  14.  15.  16.  16.]
         [ 14.  14.  15.  16.  16.]]

        [[ 17.  17.  18.  19.  19.]
         [ 17.  17.  18.  19.  19.]
         [ 20.  20.  21.  22.  22.]
         [ 20.  20.  21.  22.  22.]]]]

pad(x, mode="constant", constant_value=0, pad_width=(0,0,0,0,1,1,1,1)) =

      [[[[  0.   0.   0.   0.   0.]
         [  0.   1.   2.   3.   0.]
         [  0.   4.   5.   6.   0.]
         [  0.   0.   0.   0.   0.]]

        [[  0.   0.   0.   0.   0.]
         [  0.   7.   8.   9.   0.]
         [  0.  10.  11.  12.   0.]
         [  0.   0.   0.   0.   0.]]]


       [[[  0.   0.   0.   0.   0.]
         [  0.  11.  12.  13.   0.]
         [  0.  14.  15.  16.   0.]
         [  0.   0.   0.   0.   0.]]

        [[  0.   0.   0.   0.   0.]
         [  0.  17.  18.  19.   0.]
         [  0.  20.  21.  22.   0.]
         [  0.   0.   0.   0.   0.]]]]

Defined in /home/smola/mxnet/src/operator/pad.cc:L772

Parameters:
  • data (ndarray) – An n-dimensional input array.

  • mode ({'constant', 'edge', 'reflect'}, required) – Padding type to use. “constant” pads with constant_value “edge” pads using the edge values of the input array “reflect” pads by reflecting values with respect to the edges.

  • pad_width (Shape(tuple), required) – Widths of the padding regions applied to the edges of each axis. It is a tuple of integer padding widths for each axis of the format (before_1, after_1, ... , before_N, after_N). It should be of length 2*N where N is the number of dimensions of the array.This is equivalent to pad_width in numpy.pad, but flattened.

  • constant_value (double, optional, default=0) – The value used for padding when mode is “constant”.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.pick(data, index, axis=-1, mode='clip', keepdims=False)

Picks elements from an input array according to the input indices along the given axis.

Given an input array of shape (d0, d1) and indices of shape (i0,), the result will be an output array of shape (i0,) with:

output[i] = input[i, indices[i]]

By default, if any index mentioned is too large, it is replaced by the index that addresses the last element along an axis (the clip mode).

This function supports n-dimensional input and (n-1)-dimensional indices arrays.

Parameters:
  • data (NDArray) – The input array

  • index (NDArray) – The index array

  • axis (int or None, optional, default='-1') – int or None. The axis to picking the elements. Negative values means indexing from right to left. If is None, the elements in the index w.r.t the flattened input will be picked.

  • keepdims (boolean, optional, default=0) – If true, the axis where we pick the elements is left in the result as dimension with size one.

  • mode ({'clip', 'wrap'},optional, default='clip') – Specify how out-of-bound indices behave. Default is “clip”. “clip” means clip to the range. So, if all indices mentioned are too large, they are replaced by the index that addresses the last element along an axis. “wrap” means to wrap around.

  • out (NDArray, optional) – The output NDArray to hold the result.

Returns:

out – The output of this function.

Return type:

NDArray or list of NDArrays

Example

>>> x = np.array([[1., 2.],[3., 4.],[5., 6.]])

picks elements with specified indices along axis 0

>>> npx.pick(x, np.array([0, 1]), 0)
array([1., 4.])

picks elements with specified indices along axis 1

>>> npx.pick(x, np.array([0, 1, 0]), 1)
array([1., 4., 5.])

picks elements with specified indices along axis 1 using ‘wrap’ mode to place indicies that would normally be out of bounds

>>> npx.pick(x, np.array([2, -1, -2]), 1, mode='wrap')
array([1., 4., 5.])

picks elements with specified indices along axis 1 and dims are maintained

>>> npx.pick(x, np.array([[1.], [0.], [2.]]), 1, keepdims=True)
array([[2.],
       [3.],
       [6.]])
mxnet.ndarray.numpy_extension.pooling(data=None, kernel=None, stride=None, pad=None, pool_type='max', pooling_convention='valid', global_pool=False, cudnn_off=False, p_value=None, count_include_pad=None, layout=None, **kwargs)

Performs pooling on the input.

The shapes for 1-D pooling are

  • data and out: (batch_size, channel, width) (NCW layout) or (batch_size, width, channel) (NWC layout),

The shapes for 2-D pooling are

  • data and out: (batch_size, channel, height, width) (NCHW layout) or (batch_size, height, width, channel) (NHWC layout),

    out_height = f(height, kernel[0], pad[0], stride[0]) out_width = f(width, kernel[1], pad[1], stride[1])

The definition of f depends on pooling_convention, which has two options:

  • valid (default):

    f(x, k, p, s) = floor((x+2*p-k)/s)+1
    
  • full, which is compatible with Caffe:

    f(x, k, p, s) = ceil((x+2*p-k)/s)+1
    

When global_pool is set to be true, then global pooling is performed. It will reset kernel=(height, width) and set the appropiate padding to 0.

Three pooling options are supported by pool_type:

  • avg: average pooling

  • max: max pooling

  • sum: sum pooling

  • lp: Lp pooling

For 3-D pooling, an additional depth dimension is added before height. Namely the input data and output will have shape (batch_size, channel, depth, height, width) (NCDHW layout) or (batch_size, depth, height, width, channel) (NDHWC layout).

Notes on Lp pooling:

Lp pooling was first introduced by this paper: https://arxiv.org/pdf/1204.3968.pdf. L-1 pooling is simply sum pooling, while L-inf pooling is simply max pooling. We can see that Lp pooling stands between those two, in practice the most common value for p is 2.

For each window X, the mathematical expression for Lp pooling is:

\(f(X) = \sqrt[p]{\sum_{x}^{X} x^p}\)

Parameters:
  • data (NDArray) – Input data to the pooling operator.

  • kernel (Shape(tuple), optional, default=[]) – Pooling kernel size: (y, x) or (d, y, x)

  • pool_type ({'avg', 'lp', 'max', 'sum'},optional, default='max') – Pooling type to be applied.

  • global_pool (boolean, optional, default=0) – Ignore kernel size, do global pooling based on current input feature map.

  • cudnn_off (boolean, optional, default=0) – Turn off cudnn pooling and use MXNet pooling operator.

  • pooling_convention ({'full', 'same', 'valid'},optional, default='valid') – Pooling convention to be applied.

  • stride (Shape(tuple), optional, default=[]) – Stride: for pooling (y, x) or (d, y, x). Defaults to 1 for each dimension.

  • pad (Shape(tuple), optional, default=[]) – Pad for pooling: (y, x) or (d, y, x). Defaults to no padding.

  • p_value (int or None, optional, default='None') – Value of p for Lp pooling, can be 1 or 2, required for Lp Pooling.

  • count_include_pad (boolean or None, optional, default=None) – Only used for AvgPool, specify whether to count padding elements for averagecalculation. For example, with a 5*5 kernel on a 3*3 corner of a image,the sum of the 9 valid elements will be divided by 25 if this is set to true,or it will be divided by 9 if this is set to false. Defaults to true.

  • layout ({None, 'NCDHW', 'NCHW', 'NCW', 'NDHWC', 'NHWC', 'NWC'},optional, default='None') – Set layout for input and output. Empty for default layout: NCW for 1d, NCHW for 2d and NCDHW for 3d.

Returns:

out – The output of this function.

Return type:

NDArray or list of NDArrays

mxnet.ndarray.numpy_extension.quantized_act(data=None, min_data=None, max_data=None, act_type=_Null, out=None, name=None, **kwargs)

Activation operator for input and output data type of int8. The input and output data comes with min and max thresholds for quantizing the float32 data into int8.

Note

This operator only supports forward propogation. DO NOT use it in training. This operator only supports relu

Defined in /home/smola/mxnet/src/operator/quantization/quantized_activation.cc:L96

Parameters:
  • data (ndarray) – Input data.

  • min_data (ndarray) – Minimum value of data.

  • max_data (ndarray) – Maximum value of data.

  • act_type ({'log_sigmoid', 'mish', 'relu', 'sigmoid', 'softrelu', 'softsign', 'tanh'}, required) – Activation function to be applied.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.quantized_conv(data=None, weight=None, bias=None, min_data=None, max_data=None, min_weight=None, max_weight=None, min_bias=None, max_bias=None, kernel=_Null, stride=_Null, dilate=_Null, pad=_Null, num_filter=_Null, num_group=_Null, workspace=_Null, no_bias=_Null, cudnn_tune=_Null, cudnn_off=_Null, layout=_Null, out=None, name=None, **kwargs)

Convolution operator for input, weight and bias data type of int8, and accumulates in type int32 for the output. For each argument, two more arguments of type float32 must be provided representing the thresholds of quantizing argument from data type float32 to int8. The final outputs contain the convolution result in int32, and min and max thresholds representing the threholds for quantizing the float32 output into int32.

Note

This operator only supports forward propogation. DO NOT use it in training.

Defined in /home/smola/mxnet/src/operator/quantization/quantized_conv.cc:L189

Parameters:
  • data (ndarray) – Input data.

  • weight (ndarray) – weight.

  • bias (ndarray) – bias.

  • min_data (ndarray) – Minimum value of data.

  • max_data (ndarray) – Maximum value of data.

  • min_weight (ndarray) – Minimum value of weight.

  • max_weight (ndarray) – Maximum value of weight.

  • min_bias (ndarray) – Minimum value of bias.

  • max_bias (ndarray) – Maximum value of bias.

  • kernel (Shape(tuple), required) – Convolution kernel size: (w,), (h, w) or (d, h, w)

  • stride (Shape(tuple), optional, default=[]) – Convolution stride: (w,), (h, w) or (d, h, w). Defaults to 1 for each dimension.

  • dilate (Shape(tuple), optional, default=[]) – Convolution dilate: (w,), (h, w) or (d, h, w). Defaults to 1 for each dimension.

  • pad (Shape(tuple), optional, default=[]) – Zero pad for convolution: (w,), (h, w) or (d, h, w). Defaults to no padding.

  • num_filter (int (non-negative), required) – Convolution filter(channel) number

  • num_group (int (non-negative), optional, default=1) – Number of group partitions.

  • workspace (long (non-negative), optional, default=1024) – Maximum temporary workspace allowed (MB) in convolution.This parameter has two usages. When CUDNN is not used, it determines the effective batch size of the convolution kernel. When CUDNN is used, it controls the maximum temporary storage used for tuning the best CUDNN kernel when limited_workspace strategy is used.

  • no_bias (boolean, optional, default=0) – Whether to disable bias parameter.

  • cudnn_tune ({None, 'fastest', 'limited_workspace', 'off'},optional, default='None') – Whether to pick convolution algo by running performance test.

  • cudnn_off (boolean, optional, default=0) – Turn off cudnn for this layer.

  • layout ({None, 'NCDHW', 'NCHW', 'NCW', 'NDHWC', 'NHWC', 'NWC'},optional, default='None') –

    Set layout for input, output and weight. Empty for

    default layout: NCW for 1d, NCHW for 2d and NCDHW for 3d.NHWC and NDHWC are only supported on GPU.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.quantized_elemwise_add(lhs=None, rhs=None, lhs_min=None, lhs_max=None, rhs_min=None, rhs_max=None, min_calib_range=_Null, max_calib_range=_Null, out=None, name=None, **kwargs)

elemwise_add operator for input dataA and input dataB data type of int8, and accumulates in type int32 for the output. For each argument, two more arguments of type float32 must be provided representing the thresholds of quantizing argument from data type float32 to int8. The final outputs contain result in int32, and min and max thresholds representing the threholds for quantizing the float32 output into int32.

Note

This operator only supports forward propogation. DO NOT use it in training.

Parameters:
  • min_calib_range (float or None, optional, default=None) – The minimum scalar value in the form of float32 obtained through calibration. If present, it will be used to requantize the int8 output data.

  • max_calib_range (float or None, optional, default=None) – The maximum scalar value in the form of float32 obtained through calibration. If present, it will be used to requantize the int8 output data.

  • lhs (ndarray) – first input

  • rhs (ndarray) – second input

  • lhs_min (ndarray) – 3rd input

  • lhs_max (ndarray) – 4th input

  • rhs_min (ndarray) – 5th input

  • rhs_max (ndarray) – 6th input

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.quantized_elemwise_mul(lhs=None, rhs=None, lhs_min=None, lhs_max=None, rhs_min=None, rhs_max=None, min_calib_range=_Null, max_calib_range=_Null, enable_float_output=_Null, out=None, name=None, **kwargs)

Multiplies arguments int8 element-wise.

Defined in /home/smola/mxnet/src/operator/quantization/quantized_elemwise_mul.cc:L255

Parameters:
  • lhs (ndarray) – first input

  • rhs (ndarray) – second input

  • lhs_min (ndarray) – Minimum value of first input.

  • lhs_max (ndarray) – Maximum value of first input.

  • rhs_min (ndarray) – Minimum value of second input.

  • rhs_max (ndarray) – Maximum value of second input.

  • min_calib_range (float or None, optional, default=None) – The minimum scalar value in the form of float32 obtained through calibration. If present, it will be used to requantize the int8 output data.

  • max_calib_range (float or None, optional, default=None) – The maximum scalar value in the form of float32 obtained through calibration. If present, it will be used to requantize the int8 output data.

  • enable_float_output (boolean, optional, default=0) – Whether to enable float32 output

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.quantized_embedding(data=None, weight=None, min_weight=None, max_weight=None, input_dim=_Null, output_dim=_Null, dtype=_Null, sparse_grad=_Null, out=None, name=None, **kwargs)

Maps integer indices to int8 vector representations (embeddings).

Defined in /home/smola/mxnet/src/operator/quantization/quantized_indexing_op.cc:L144

Parameters:
  • data (ndarray) – The input array to the embedding operator.

  • weight (ndarray) – The embedding weight matrix.

  • min_weight (ndarray) – Minimum value of data.

  • max_weight (ndarray) – Maximum value of data.

  • input_dim (long, required) – Vocabulary size of the input indices.

  • output_dim (long, required) – Dimension of the embedding vectors.

  • dtype ({'bfloat16', 'float16', 'float32', 'float64', 'int32', 'int64', 'int8', 'uint8'},optional, default='float32') – Data type of weight.

  • sparse_grad (boolean, optional, default=0) – Compute row sparse gradient in the backward calculation. If set to True, the grad’s storage type is row_sparse.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.quantized_flatten(data=None, min_data=None, max_data=None, out=None, name=None, **kwargs)
Parameters:
  • data (ndarray) – A ndarray/symbol of type float32

  • min_data (ndarray) – The minimum scalar value possibly produced for the data

  • max_data (ndarray) – The maximum scalar value possibly produced for the data

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.quantized_fully_connected(data=None, weight=None, bias=None, min_data=None, max_data=None, min_weight=None, max_weight=None, min_bias=None, max_bias=None, num_hidden=_Null, no_bias=_Null, flatten=_Null, out=None, name=None, **kwargs)

Fully Connected operator for input, weight and bias data type of int8, and accumulates in type int32 for the output. For each argument, two more arguments of type float32 must be provided representing the thresholds of quantizing argument from data type float32 to int8. The final outputs contain the convolution result in int32, and min and max thresholds representing the threholds for quantizing the float32 output into int32.

Note

This operator only supports forward propogation. DO NOT use it in training.

Defined in /home/smola/mxnet/src/operator/quantization/quantized_fully_connected.cc:L328

Parameters:
  • data (ndarray) – Input data.

  • weight (ndarray) – weight.

  • bias (ndarray) – bias.

  • min_data (ndarray) – Minimum value of data.

  • max_data (ndarray) – Maximum value of data.

  • min_weight (ndarray) – Minimum value of weight.

  • max_weight (ndarray) – Maximum value of weight.

  • min_bias (ndarray) – Minimum value of bias.

  • max_bias (ndarray) – Maximum value of bias.

  • num_hidden (int, required) – Number of hidden nodes of the output.

  • no_bias (boolean, optional, default=0) – Whether to disable bias parameter.

  • flatten (boolean, optional, default=1) – Whether to collapse all but the first axis of the input data tensor.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.quantized_npi_add(lhs=None, rhs=None, lhs_min=None, lhs_max=None, rhs_min=None, rhs_max=None, min_calib_range=_Null, max_calib_range=_Null, out=None, name=None, **kwargs)

elemwise_add operator for input dataA and input dataB data type of int8, and accumulates in type int32 for the output. For each argument, two more arguments of type float32 must be provided representing the thresholds of quantizing argument from data type float32 to int8. The final outputs contain result in int32, and min and max thresholds representing the threholds for quantizing the float32 output into int32.

Note

This operator only supports forward propogation. DO NOT use it in training.

Parameters:
  • min_calib_range (float or None, optional, default=None) – The minimum scalar value in the form of float32 obtained through calibration. If present, it will be used to requantize the int8 output data.

  • max_calib_range (float or None, optional, default=None) – The maximum scalar value in the form of float32 obtained through calibration. If present, it will be used to requantize the int8 output data.

  • lhs (ndarray) – first input

  • rhs (ndarray) – second input

  • lhs_min (ndarray) – 3rd input

  • lhs_max (ndarray) – 4th input

  • rhs_min (ndarray) – 5th input

  • rhs_max (ndarray) – 6th input

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.quantized_pooling(data=None, min_data=None, max_data=None, kernel=_Null, pool_type=_Null, global_pool=_Null, cudnn_off=_Null, pooling_convention=_Null, stride=_Null, pad=_Null, p_value=_Null, count_include_pad=_Null, layout=_Null, output_size=_Null, out=None, name=None, **kwargs)

Pooling operator for input and output data type of int8. The input and output data comes with min and max thresholds for quantizing the float32 data into int8.

Note

This operator only supports pool_type of avg or max. Backward propagation computes the data gradient and returns zero min/max gradients.

Defined in /home/smola/mxnet/src/operator/quantization/quantized_pooling.cc:L443

Parameters:
  • data (ndarray) – Input data.

  • min_data (ndarray) – Minimum value of data.

  • max_data (ndarray) – Maximum value of data.

  • kernel (Shape(tuple), optional, default=[]) – Pooling kernel size: (y, x) or (d, y, x)

  • pool_type ({'avg', 'lp', 'max', 'sum'},optional, default='max') – Pooling type to be applied.

  • global_pool (boolean, optional, default=0) – Ignore kernel size, do global pooling based on current input feature map.

  • cudnn_off (boolean, optional, default=0) – Turn off cudnn pooling and use MXNet pooling operator.

  • pooling_convention ({'full', 'same', 'valid'},optional, default='valid') – Pooling convention to be applied.

  • stride (Shape(tuple), optional, default=[]) – Stride: for pooling (y, x) or (d, y, x). Defaults to 1 for each dimension.

  • pad (Shape(tuple), optional, default=[]) – Pad for pooling: (y, x) or (d, y, x). Defaults to no padding.

  • p_value (int or None, optional, default='None') – Value of p for Lp pooling, can be 1 or 2, required for Lp Pooling.

  • count_include_pad (boolean or None, optional, default=None) – Only used for AvgPool, specify whether to count padding elements for averagecalculation. For example, with a 5*5 kernel on a 3*3 corner of a image,the sum of the 9 valid elements will be divided by 25 if this is set to true,or it will be divided by 9 if this is set to false. Defaults to true.

  • layout ({None, 'NCDHW', 'NCHW', 'NCW', 'NDHWC', 'NHWC', 'NWC'},optional, default='None') –

    Set layout for input and output. Empty for

    default layout: NCW for 1d, NCHW for 2d and NCDHW for 3d.

  • output_size (Shape or None, optional, default=None) – Only used for Adaptive Pooling. int (output size) or a tuple of int for output (height, width).

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.quantized_reshape(data=None, min_data=None, max_data=None, newshape=_Null, reverse=_Null, order=_Null, out=None, name=None, **kwargs)
Parameters:
  • data (ndarray) – Array to be reshaped.

  • min_data (ndarray) – The minimum scalar value possibly produced for the data

  • max_data (ndarray) – The maximum scalar value possibly produced for the data

  • newshape (Shape(tuple), required) – The new shape should be compatible with the original shape. If an integer, then the result will be a 1-D array of that length. One shape dimension can be -1. In this case, the value is inferred from the length of the array and remaining dimensions. -2 to -6 are used for data manipulation. -2 copy this dimension from the input to the output shape. -3 will skip current dimension if and only if the current dim size is one. -4 copy all remain of the input dimensions to the output shape. -5 use the product of two consecutive dimensions of the input shape as the output. -6 split one dimension of the input into two dimensions passed subsequent to -6 in the new shape.

  • reverse (boolean, optional, default=0) – If true then the special values are inferred from right to left

  • order (string, optional, default='C') – Read the elements of a using this index order, and place the elements into the reshaped array using this index order. ‘C’ means to read/write the elements using C-like index order, with the last axis index changing fastest, back to the first axis index changing slowest. Note that currently only C-like order is supported

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.quantized_transpose(data=None, min_data=None, max_data=None, axes=_Null, out=None, name=None, **kwargs)
Parameters:
  • data (ndarray) – Array to be transposed.

  • min_data (ndarray) – The minimum scalar value possibly produced for the data

  • max_data (ndarray) – The maximum scalar value possibly produced for the data

  • axes (Shape(tuple), optional, default=None) – By default, reverse the dimensions, otherwise permute the axes according to the values given.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.relu(data=None, out=None, name=None, **kwargs)

Computes rectified linear activation.

\[max(features, 0)\]

Defined in /home/smola/mxnet/src/operator/numpy/np_elemwise_unary_op_basic.cc:L38

Parameters:
  • data (ndarray) – The input array.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.requantize(data=None, min_range=None, max_range=None, out_type=_Null, min_calib_range=_Null, max_calib_range=_Null, out=None, name=None, **kwargs)

Given data that is quantized in int32 and the corresponding thresholds, requantize the data into int8 using min and max thresholds either calculated at runtime or from calibration. It’s highly recommended to pre-calucate the min and max thresholds through calibration since it is able to save the runtime of the operator and improve the inference accuracy.

Note

This operator only supports forward propogation. DO NOT use it in training.

Defined in /home/smola/mxnet/src/operator/quantization/requantize.cc:L83

Parameters:
  • data (ndarray) – A ndarray/symbol of type int32

  • min_range (ndarray) – The original minimum scalar value in the form of float32 used for quantizing data into int32.

  • max_range (ndarray) – The original maximum scalar value in the form of float32 used for quantizing data into int32.

  • out_type ({'auto', 'int8', 'uint8'},optional, default='int8') – Output data type. auto can be specified to automatically determine output type according to min_calib_range.

  • min_calib_range (float or None, optional, default=None) – The minimum scalar value in the form of float32 obtained through calibration. If present, it will be used to requantize the int32 data into int8.

  • max_calib_range (float or None, optional, default=None) – The maximum scalar value in the form of float32 obtained through calibration. If present, it will be used to requantize the int32 data into int8.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.reshape(a=None, newshape=_Null, reverse=_Null, order=_Null, out=None, name=None, **kwargs)

Gives a new shape to an array without changing its data. This function always returns a copy of the input array if out is not provided.

Parameters:
  • a (ndarray) – Array to be reshaped.

  • newshape (int or tuple of ints) –

    The new shape should be compatible with the original shape. If an integer, then the result will be a 1-D array of that length. One shape dimension can be -1. In this case, the value is inferred from the length of the array and remaining dimensions. -2 to -6 are used for data manipulation.

    • -2 copy this dimension from the input to the output shape.

    • -3 will skip current dimension if and only if the current dim size is one.

    • -4 copy all remain of the input dimensions to the output shape.

    • -5 use the product of two consecutive dimensions of the input shape as the output.

    • -6 split one dimension of the input into two dimensions passed subsequent to -6 in the new shape.

  • reverse (bool, optional) – If set to true, the special values will be inferred from right to left.

  • order ({'C'}, optional) – Read the elements of a using this index order, and place the elements into the reshaped array using this index order. ‘C’ means to read / write the elements using C-like index order, with the last axis index changing fastest, back to the first axis index changing slowest. Other order types such as ‘F’/’A’ may be added in the future.

Returns:

reshaped_array – It will be always a copy of the original array. This behavior is different from the official NumPy reshape operator where views of the original array may be generated.

Return type:

ndarray

Examples

>>> x = np.ones((2, 3, 8))
>>> npx.reshape(x, (-2, -2, 2, -1)).shape
(2, 3, 2, 4)
>>> x = np.ones((8, 3, 3, 3, 4, 4))
>>> npx.reshape(x, (-6, 2, -1, -4)).shape
(2, 4, 3, 3, 3, 4, 4)
>>> x = np.ones((8, 3, 3, 3, 4, 4))
>>> npx.reshape(x, (-5, -4)).shape
(24, 3, 3, 4, 4)
>>> x = np.ones((8, 1, 1, 1, 3))
>>> npx.reshape(x, (-2, -3, -3, -3, -2)).shape
(8, 3)
>>> x = np.ones((8, 3, 3, 3, 3, 8))
>>> npx.reshape(x, (-4, -5), reverse=True).shape
(8, 3, 3, 3, 24)
>>> x = np.ones((8, 3, 2, 4, 8))
>>> npx.reshape(x, (-4, -1, 2, -6), reverse=True).shape
(8, 3, 2, 4, 4, 2)
mxnet.ndarray.numpy_extension.reshape_like(lhs=None, rhs=None, lhs_begin=_Null, lhs_end=_Null, rhs_begin=_Null, rhs_end=_Null, out=None, name=None, **kwargs)

Reshape some or all dimensions of lhs to have the same shape as some or all dimensions of rhs.

Returns a view of the lhs array with a new shape without altering any data.

Example:

x = [1, 2, 3, 4, 5, 6]
y = [[0, -4], [3, 2], [2, 2]]
reshape_like(x, y) = [[1, 2], [3, 4], [5, 6]]

More precise control over how dimensions are inherited is achieved by specifying slices over the lhs and rhs array dimensions. Only the sliced lhs dimensions are reshaped to the rhs sliced dimensions, with the non-sliced lhs dimensions staying the same.

Examples:

- lhs shape = (30,7), rhs shape = (15,2,4), lhs_begin=0, lhs_end=1, rhs_begin=0, rhs_end=2, output shape = (15,2,7)
- lhs shape = (3, 5), rhs shape = (1,15,4), lhs_begin=0, lhs_end=2, rhs_begin=1, rhs_end=2, output shape = (15)

Negative indices are supported, and None can be used for either lhs_end or rhs_end to indicate the end of the range.

Example:

- lhs shape = (30, 12), rhs shape = (4, 2, 2, 3), lhs_begin=-1, lhs_end=None, rhs_begin=1, rhs_end=None, output shape = (30, 2, 2, 3)

Defined in /home/smola/mxnet/src/operator/tensor/elemwise_unary_op_basic.cc:L629

Parameters:
  • lhs (ndarray) – First input.

  • rhs (ndarray) – Second input.

  • lhs_begin (int or None, optional, default='None') – Defaults to 0. The beginning index along which the lhs dimensions are to be reshaped. Supports negative indices.

  • lhs_end (int or None, optional, default='None') – Defaults to None. The ending index along which the lhs dimensions are to be used for reshaping. Supports negative indices.

  • rhs_begin (int or None, optional, default='None') – Defaults to 0. The beginning index along which the rhs dimensions are to be used for reshaping. Supports negative indices.

  • rhs_end (int or None, optional, default='None') – Defaults to None. The ending index along which the rhs dimensions are to be used for reshaping. Supports negative indices.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.rnn(data=None, parameters=None, state=None, state_cell=None, sequence_length=None, mode=None, state_size=None, num_layers=None, bidirectional=False, state_outputs=False, p=0.0, use_sequence_length=False, projection_size=None, lstm_state_clip_min=None, lstm_state_clip_max=None, lstm_state_clip_nan=None)

Applies recurrent layers to input data. Currently, vanilla RNN, LSTM and GRU are implemented, with both multi-layer and bidirectional support.

When the input data is of type float32 and the environment variables MXNET_CUDA_ALLOW_TENSOR_CORE and MXNET_CUDA_TENSOR_OP_MATH_ALLOW_CONVERSION are set to 1, this operator will try to use pseudo-float16 precision (float32 math with float16 I/O) precision in order to use Tensor Cores on suitable NVIDIA GPUs. This can sometimes give significant speedups.

Vanilla RNN

Applies a single-gate recurrent layer to input X. Two kinds of activation function are supported: ReLU and Tanh.

With ReLU activation function:

\[h_t = relu(W_{ih} * x_t + b_{ih} + W_{hh} * h_{(t-1)} + b_{hh})\]

With Tanh activtion function:

\[h_t = \tanh(W_{ih} * x_t + b_{ih} + W_{hh} * h_{(t-1)} + b_{hh})\]

Reference paper: Finding structure in time - Elman, 1988. https://axon.cs.byu.edu/~martinez/classes/678/Papers/Elman_time.pdf

LSTM

Long Short-Term Memory - Hochreiter, 1997. http://www.bioinf.jku.at/publications/older/2604.pdf

\[\begin{split}\begin{array}{ll} i_t = \mathrm{sigmoid}(W_{ii} x_t + b_{ii} + W_{hi} h_{(t-1)} + b_{hi}) \\ f_t = \mathrm{sigmoid}(W_{if} x_t + b_{if} + W_{hf} h_{(t-1)} + b_{hf}) \\ g_t = \tanh(W_{ig} x_t + b_{ig} + W_{hc} h_{(t-1)} + b_{hg}) \\ o_t = \mathrm{sigmoid}(W_{io} x_t + b_{io} + W_{ho} h_{(t-1)} + b_{ho}) \\ c_t = f_t * c_{(t-1)} + i_t * g_t \\ h_t = o_t * \tanh(c_t) \end{array}\end{split}\]

With the projection size being set, LSTM could use the projection feature to reduce the parameters size and give some speedups without significant damage to the accuracy.

Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition - Sak et al. 2014. https://arxiv.org/abs/1402.1128

\[\begin{split}\begin{array}{ll} i_t = \mathrm{sigmoid}(W_{ii} x_t + b_{ii} + W_{ri} r_{(t-1)} + b_{ri}) \\ f_t = \mathrm{sigmoid}(W_{if} x_t + b_{if} + W_{rf} r_{(t-1)} + b_{rf}) \\ g_t = \tanh(W_{ig} x_t + b_{ig} + W_{rc} r_{(t-1)} + b_{rg}) \\ o_t = \mathrm{sigmoid}(W_{io} x_t + b_{o} + W_{ro} r_{(t-1)} + b_{ro}) \\ c_t = f_t * c_{(t-1)} + i_t * g_t \\ h_t = o_t * \tanh(c_t) r_t = W_{hr} h_t \end{array}\end{split}\]

GRU

Gated Recurrent Unit - Cho et al. 2014. http://arxiv.org/abs/1406.1078

The definition of GRU here is slightly different from paper but compatible with CUDNN.

\[\begin{split}\begin{array}{ll} r_t = \mathrm{sigmoid}(W_{ir} x_t + b_{ir} + W_{hr} h_{(t-1)} + b_{hr}) \\ z_t = \mathrm{sigmoid}(W_{iz} x_t + b_{iz} + W_{hz} h_{(t-1)} + b_{hz}) \\ n_t = \tanh(W_{in} x_t + b_{in} + r_t * (W_{hn} h_{(t-1)}+ b_{hn})) \\ h_t = (1 - z_t) * n_t + z_t * h_{(t-1)} \\ \end{array}\end{split}\]
Parameters:
  • data (NDArray) – Input data to RNN

  • parameters (NDArray) – Vector of all RNN trainable parameters concatenated

  • state (NDArray) – initial hidden state of the RNN

  • state_cell (NDArray) – initial cell state for LSTM networks (only for LSTM)

  • sequence_length (NDArray) – Vector of valid sequence lengths for each element in batch. (Only used if use_sequence_length kwarg is True)

  • state_size (int (non-negative), required) – size of the state for each layer

  • num_layers (int (non-negative), required) – number of stacked layers

  • bidirectional (boolean, optional, default=0) – whether to use bidirectional recurrent layers

  • mode ({'gru', 'lstm', 'rnn_relu', 'rnn_tanh'}, required) – the type of RNN to compute

  • p (float, optional, default=0) – drop rate of the dropout on the outputs of each RNN layer, except the last layer.

  • state_outputs (boolean, optional, default=0) – Whether to have the states as symbol outputs.

  • projection_size (int or None, optional, default='None') – size of project size

  • lstm_state_clip_min (double or None, optional, default=None) – Minimum clip value of LSTM states. This option must be used together with lstm_state_clip_max.

  • lstm_state_clip_max (double or None, optional, default=None) – Maximum clip value of LSTM states. This option must be used together with lstm_state_clip_min.

  • lstm_state_clip_nan (boolean, optional, default=0) – Whether to stop NaN from propagating in state by clipping it to min/max. If clipping range is not specified, this option is ignored.

  • use_sequence_length (boolean, optional, default=0) – If set to true, this layer takes in an extra input parameter sequence_length to specify variable length sequence

Returns:

out – The output of this function.

Return type:

NDArray or list of NDArrays

mxnet.ndarray.numpy_extension.roi_pooling(data=None, rois=None, pooled_size=_Null, spatial_scale=_Null, out=None, name=None, **kwargs)

Performs region of interest(ROI) pooling on the input array.

ROI pooling is a variant of a max pooling layer, in which the output size is fixed and region of interest is a parameter. Its purpose is to perform max pooling on the inputs of non-uniform sizes to obtain fixed-size feature maps. ROI pooling is a neural-net layer mostly used in training a Fast R-CNN network for object detection.

This operator takes a 4D feature map as an input array and region proposals as rois, then it pools over sub-regions of input and produces a fixed-sized output array regardless of the ROI size.

To crop the feature map accordingly, you can resize the bounding box coordinates by changing the parameters rois and spatial_scale.

The cropped feature maps are pooled by standard max pooling operation to a fixed size output indicated by a pooled_size parameter. batch_size will change to the number of region bounding boxes after ROIPooling.

The size of each region of interest doesn’t have to be perfectly divisible by the number of pooling sections(pooled_size).

Example:

x = [[[[  0.,   1.,   2.,   3.,   4.,   5.],
       [  6.,   7.,   8.,   9.,  10.,  11.],
       [ 12.,  13.,  14.,  15.,  16.,  17.],
       [ 18.,  19.,  20.,  21.,  22.,  23.],
       [ 24.,  25.,  26.,  27.,  28.,  29.],
       [ 30.,  31.,  32.,  33.,  34.,  35.],
       [ 36.,  37.,  38.,  39.,  40.,  41.],
       [ 42.,  43.,  44.,  45.,  46.,  47.]]]]

// region of interest i.e. bounding box coordinates.
y = [[0,0,0,4,4]]

// returns array of shape (2,2) according to the given roi with max pooling.
ROIPooling(x, y, (2,2), 1.0) = [[[[ 14.,  16.],
                                  [ 26.,  28.]]]]

// region of interest is changed due to the change in `spacial_scale` parameter.
ROIPooling(x, y, (2,2), 0.7) = [[[[  7.,   9.],
                                  [ 19.,  21.]]]]

Defined in /home/smola/mxnet/src/operator/roi_pooling.cc:L217

Parameters:
  • data (ndarray) – The input array to the pooling operator, a 4D Feature maps

  • rois (ndarray) – Bounding box coordinates, a 2D array of [[batch_index, x1, y1, x2, y2]], where (x1, y1) and (x2, y2) are top left and bottom right corners of designated region of interest. batch_index indicates the index of corresponding image in the input array

  • pooled_size (Shape(tuple), required) – ROI pooling output shape (h,w)

  • spatial_scale (float, required) – Ratio of input feature map height (or w) to raw image height (or w). Equals the reciprocal of total stride in convolutional layers

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.round_ste(data=None, out=None, name=None, **kwargs)

Straight-through-estimator of round().

In forward pass, returns element-wise rounded value to the nearest integer of the input (same as round()).

In backward pass, returns gradients of 1 everywhere (instead of 0 everywhere as in round()): \(\frac{d}{dx}{round\_ste(x)} = 1\) vs. \(\frac{d}{dx}{round(x)} = 0\). This is useful for quantized training.

Reference: Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation.

Example::

x = round_ste([-1.5, 1.5, -1.9, 1.9, 2.7]) x.backward() x = [-2., 2., -2., 2., 3.] x.grad() = [1., 1., 1., 1., 1.]

The storage type of round_ste output depends upon the input storage type:
  • round_ste(default) = default

  • round_ste(row_sparse) = row_sparse

  • round_ste(csr) = csr

Defined in /home/smola/mxnet/src/operator/contrib/stes_op.cc:L54

Parameters:
  • data (ndarray) – The input array.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.scalar_poisson(lam=_Null, shape=_Null, ctx=_Null, dtype=_Null, out=None, name=None, **kwargs)

Draw random samples from a Poisson distribution.

Samples are distributed according to a Poisson distribution parametrized by lambda (rate). Samples will always be returned as a floating point data type.

Example:

poisson(lam=4, shape=(2,2)) = [[ 5.,  2.],
                               [ 4.,  6.]]

Defined in /home/smola/mxnet/src/operator/random/sample_op.cc:L152

Parameters:
  • lam (float, optional, default=1) – Lambda parameter (rate) of the Poisson distribution.

  • shape (Shape(tuple), optional, default=None) – Shape of the output.

  • ctx (string, optional, default='') – Context of output, in format [cpu|gpu|cpu_pinned](n). Only used for imperative calls.

  • dtype ({'None', 'bfloat16', 'float16', 'float32', 'float64'},optional, default='None') – DType of the output in case this can’t be inferred. Defaults to float32 if not defined (dtype=None).

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.sequence_last(data=None, sequence_length=None, use_sequence_length=_Null, axis=_Null, out=None, name=None, **kwargs)

Takes the last element of a sequence.

This function takes an n-dimensional input array of the form [max_sequence_length, batch_size, other_feature_dims] and returns a (n-1)-dimensional array of the form [batch_size, other_feature_dims].

Parameter sequence_length is used to handle variable-length sequences. sequence_length should be an input array of positive ints of dimension [batch_size]. To use this parameter, set use_sequence_length to True, otherwise each example in the batch is assumed to have the max sequence length.

Note

Alternatively, you can also use take operator.

Example:

x = [[[  1.,   2.,   3.],
      [  4.,   5.,   6.],
      [  7.,   8.,   9.]],

     [[ 10.,   11.,   12.],
      [ 13.,   14.,   15.],
      [ 16.,   17.,   18.]],

     [[  19.,   20.,   21.],
      [  22.,   23.,   24.],
      [  25.,   26.,   27.]]]

// returns last sequence when sequence_length parameter is not used
SequenceLast(x) = [[  19.,   20.,   21.],
                   [  22.,   23.,   24.],
                   [  25.,   26.,   27.]]

// sequence_length is used
SequenceLast(x, sequence_length=[1,1,1], use_sequence_length=True) =
         [[  1.,   2.,   3.],
          [  4.,   5.,   6.],
          [  7.,   8.,   9.]]

// sequence_length is used
SequenceLast(x, sequence_length=[1,2,3], use_sequence_length=True) =
         [[  1.,    2.,   3.],
          [  13.,  14.,  15.],
          [  25.,  26.,  27.]]

Defined in /home/smola/mxnet/src/operator/sequence_last.cc:L103

Parameters:
  • data (ndarray) – n-dimensional input array of the form [max_sequence_length, batch_size, other_feature_dims] where n>2

  • sequence_length (ndarray) – vector of sequence lengths of the form [batch_size]

  • use_sequence_length (boolean, optional, default=0) – If set to true, this layer takes in an extra input parameter sequence_length to specify variable length sequence

  • axis (int, optional, default='0') – The sequence axis. Only values of 0 and 1 are currently supported.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.sequence_mask(data=None, sequence_length=None, use_sequence_length=_Null, value=_Null, axis=_Null, out=None, name=None, **kwargs)

Sets all elements outside the sequence to a constant value.

This function takes an n-dimensional input array of the form [max_sequence_length, batch_size, other_feature_dims] and returns an array of the same shape.

Parameter sequence_length is used to handle variable-length sequences. sequence_length should be an input array of positive ints of dimension [batch_size]. To use this parameter, set use_sequence_length to True, otherwise each example in the batch is assumed to have the max sequence length and this operator works as the identity operator.

Example:

x = [[[  1.,   2.,   3.],
      [  4.,   5.,   6.]],

     [[  7.,   8.,   9.],
      [ 10.,  11.,  12.]],

     [[ 13.,  14.,   15.],
      [ 16.,  17.,   18.]]]

// Batch 1
B1 = [[  1.,   2.,   3.],
      [  7.,   8.,   9.],
      [ 13.,  14.,  15.]]

// Batch 2
B2 = [[  4.,   5.,   6.],
      [ 10.,  11.,  12.],
      [ 16.,  17.,  18.]]

// works as identity operator when sequence_length parameter is not used
SequenceMask(x) = [[[  1.,   2.,   3.],
                    [  4.,   5.,   6.]],

                   [[  7.,   8.,   9.],
                    [ 10.,  11.,  12.]],

                   [[ 13.,  14.,   15.],
                    [ 16.,  17.,   18.]]]

// sequence_length [1,1] means 1 of each batch will be kept
// and other rows are masked with default mask value = 0
SequenceMask(x, sequence_length=[1,1], use_sequence_length=True) =
             [[[  1.,   2.,   3.],
               [  4.,   5.,   6.]],

              [[  0.,   0.,   0.],
               [  0.,   0.,   0.]],

              [[  0.,   0.,   0.],
               [  0.,   0.,   0.]]]

// sequence_length [2,3] means 2 of batch B1 and 3 of batch B2 will be kept
// and other rows are masked with value = 1
SequenceMask(x, sequence_length=[2,3], use_sequence_length=True, value=1) =
             [[[  1.,   2.,   3.],
               [  4.,   5.,   6.]],

              [[  7.,   8.,   9.],
               [  10.,  11.,  12.]],

              [[   1.,   1.,   1.],
               [  16.,  17.,  18.]]]

Defined in /home/smola/mxnet/src/operator/sequence_mask.cc:L186

Parameters:
  • data (ndarray) – n-dimensional input array of the form [max_sequence_length, batch_size, other_feature_dims] where n>2

  • sequence_length (ndarray) – vector of sequence lengths of the form [batch_size]

  • use_sequence_length (boolean, optional, default=0) – If set to true, this layer takes in an extra input parameter sequence_length to specify variable length sequence

  • value (float, optional, default=0) – The value to be used as a mask.

  • axis (int, optional, default='0') – The sequence axis. Only values of 0 and 1 are currently supported.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.sequence_reverse(data=None, sequence_length=None, use_sequence_length=_Null, axis=_Null, out=None, name=None, **kwargs)

Reverses the elements of each sequence.

This function takes an n-dimensional input array of the form [max_sequence_length, batch_size, other_feature_dims] and returns an array of the same shape.

Parameter sequence_length is used to handle variable-length sequences. sequence_length should be an input array of positive ints of dimension [batch_size]. To use this parameter, set use_sequence_length to True, otherwise each example in the batch is assumed to have the max sequence length.

Example:

x = [[[  1.,   2.,   3.],
      [  4.,   5.,   6.]],

     [[  7.,   8.,   9.],
      [ 10.,  11.,  12.]],

     [[ 13.,  14.,   15.],
      [ 16.,  17.,   18.]]]

// Batch 1
B1 = [[  1.,   2.,   3.],
      [  7.,   8.,   9.],
      [ 13.,  14.,  15.]]

// Batch 2
B2 = [[  4.,   5.,   6.],
      [ 10.,  11.,  12.],
      [ 16.,  17.,  18.]]

// returns reverse sequence when sequence_length parameter is not used
SequenceReverse(x) = [[[ 13.,  14.,   15.],
                       [ 16.,  17.,   18.]],

                      [[  7.,   8.,   9.],
                       [ 10.,  11.,  12.]],

                      [[  1.,   2.,   3.],
                       [  4.,   5.,   6.]]]

// sequence_length [2,2] means 2 rows of
// both batch B1 and B2 will be reversed.
SequenceReverse(x, sequence_length=[2,2], use_sequence_length=True) =
                  [[[  7.,   8.,   9.],
                    [ 10.,  11.,  12.]],

                   [[  1.,   2.,   3.],
                    [  4.,   5.,   6.]],

                   [[ 13.,  14.,   15.],
                    [ 16.,  17.,   18.]]]

// sequence_length [2,3] means 2 of batch B2 and 3 of batch B3
// will be reversed.
SequenceReverse(x, sequence_length=[2,3], use_sequence_length=True) =
                 [[[  7.,   8.,   9.],
                   [ 16.,  17.,  18.]],

                  [[  1.,   2.,   3.],
                   [ 10.,  11.,  12.]],

                  [[ 13.,  14,   15.],
                   [  4.,   5.,   6.]]]

Defined in /home/smola/mxnet/src/operator/sequence_reverse.cc:L118

Parameters:
  • data (ndarray) – n-dimensional input array of the form [max_sequence_length, batch_size, other dims] where n>2

  • sequence_length (ndarray) – vector of sequence lengths of the form [batch_size]

  • use_sequence_length (boolean, optional, default=0) – If set to true, this layer takes in an extra input parameter sequence_length to specify variable length sequence

  • axis (int, optional, default='0') – The sequence axis. Only 0 is currently supported.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.shape_array(data=None, out=None, name=None, **kwargs)

Returns a 1D int64 array containing the shape of data.

Example:

shape_array([[1,2,3,4], [5,6,7,8]]) = [2,4]

Defined in /home/smola/mxnet/src/operator/tensor/elemwise_unary_op_basic.cc:L698

Parameters:
  • data (ndarray) – Input Array.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.sigmoid(data=None, out=None, name=None, **kwargs)

Computes sigmoid of x element-wise.

\[y = 1 / (1 + exp(-x))\]

Defined in /home/smola/mxnet/src/operator/numpy/np_elemwise_unary_op_basic.cc:L49

Parameters:
  • data (ndarray) – The input array.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.sign_ste(data=None, out=None, name=None, **kwargs)

Straight-through-estimator of sign().

In forward pass, returns element-wise sign of the input (same as sign()).

In backward pass, returns gradients of 1 everywhere (instead of 0 everywhere as in sign()): \(\frac{d}{dx}{sign\_ste(x)} = 1\) vs. \(\frac{d}{dx}{sign(x)} = 0\). This is useful for quantized training.

Reference: Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation.

Example::

x = sign_ste([-2, 0, 3]) x.backward() x = [-1., 0., 1.] x.grad() = [1., 1., 1.]

The storage type of sign_ste output depends upon the input storage type:
  • round_ste(default) = default

  • round_ste(row_sparse) = row_sparse

  • round_ste(csr) = csr

Defined in /home/smola/mxnet/src/operator/contrib/stes_op.cc:L80

Parameters:
  • data (ndarray) – The input array.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.sldwin_atten_context(score=None, value=None, dilation=None, w=_Null, symmetric=_Null, out=None, name=None, **kwargs)

Compute the context vector for sliding window attention, used in Longformer (https://arxiv.org/pdf/2004.05150.pdf).

In this attention pattern, given a fixed window size 2w, each token attends to w tokens on the left side if we use causal attention (setting symmetric to False), otherwise each token attends to w tokens on each side.

The shapes of the inputs are: - score :

  • (batch_size, seq_length, num_heads, w + w + 1) if symmetric is True,

  • (batch_size, seq_length, num_heads, w + 1) otherwise

  • value : (batch_size, seq_length, num_heads, num_head_units)

  • dilation : (num_heads,)

The shape of the output is: - context_vec : (batch_size, seq_length, num_heads, num_head_units)

Defined in /home/smola/mxnet/src/operator/contrib/transformer.cc:L1045

Parameters:
  • score (ndarray) – score

  • value (ndarray) – value

  • dilation (ndarray) – dilation

  • w (int, required) – The one-sided window length

  • symmetric (boolean, required) – If false, each token will only attend to itself and the previous tokens.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.sldwin_atten_mask_like(score=None, dilation=None, valid_length=None, w=_Null, symmetric=_Null, out=None, name=None, **kwargs)

Compute the mask for the sliding window attention score, used in Longformer (https://arxiv.org/pdf/2004.05150.pdf).

In this attention pattern, given a fixed window size 2w, each token attends to w tokens on the left side if we use causal attention (setting symmetric to False), otherwise each token attends to w tokens on each side.

The shapes of the inputs are: - score :

  • (batch_size, seq_length, num_heads, w + w + 1) if symmetric is True,

  • (batch_size, seq_length, num_heads, w + 1) otherwise.

  • dilation : (num_heads,)

  • valid_length : (batch_size,)

The shape of the output is: - mask : same as the shape of score

Defined in /home/smola/mxnet/src/operator/contrib/transformer.cc:L909

Parameters:
  • score (ndarray) – sliding window attention score

  • dilation (ndarray) – dilation

  • valid_length (ndarray) – valid length

  • w (int, required) – The one-sided window length

  • symmetric (boolean, required) – If false, each token will only attend to itself and the previous tokens.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.sldwin_atten_score(query=None, key=None, dilation=None, w=_Null, symmetric=_Null, out=None, name=None, **kwargs)

Compute the sliding window attention score, which is used in Longformer (https://arxiv.org/pdf/2004.05150.pdf). In this attention pattern, given a fixed window size 2w, each token attends to w tokens on the left side if we use causal attention (setting symmetric to False), otherwise each token attends to w tokens on each side.

The shapes of the inputs are: - query : (batch_size, seq_length, num_heads, num_head_units) - key : (batch_size, seq_length, num_heads, num_head_units) - dilation : (num_heads,)

The shape of the output is: - score :

  • (batch_size, seq_length, num_heads, w + w + 1) if symmetric is True,

  • (batch_size, seq_length, num_heads, w + 1) otherwise.

Defined in /home/smola/mxnet/src/operator/contrib/transformer.cc:L969

Parameters:
  • query (ndarray) – query

  • key (ndarray) – key

  • dilation (ndarray) – dilation

  • w (int, required) – The one-sided window length

  • symmetric (boolean, required) – If false, each token will only attend to itself and the previous tokens.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.slice(data=None, begin=_Null, end=_Null, step=_Null, out=None, name=None, **kwargs)

Slices a region of the array.

Note

crop is deprecated. Use slice instead.

This function returns a sliced array between the indices given by begin and end with the corresponding step. For an input array of shape=(d_0, d_1, ..., d_n-1), slice operation with begin=(b_0, b_1...b_m-1), end=(e_0, e_1, ..., e_m-1), and step=(s_0, s_1, ..., s_m-1), where m <= n, results in an array with the shape (|e_0-b_0|/|s_0|, ..., |e_m-1-b_m-1|/|s_m-1|, d_m, ..., d_n-1). The resulting array’s k-th dimension contains elements from the k-th dimension of the input array starting from index b_k (inclusive) with step s_k until reaching e_k (exclusive). If the k-th elements are None in the sequence of begin, end, and step, the following rule will be used to set default values. If s_k is None, set s_k=1. If s_k > 0, set b_k=0, e_k=d_k; else, set b_k=d_k-1, e_k=-1. The storage type of slice output depends on storage types of inputs * slice(csr) = csr * otherwise, slice generates output with default storage

Note

When input data storage type is csr, it only supports step=(), or step=(None,), or step=(1,) to generate a csr output. For other step parameter values, it falls back to slicing a dense tensor.

Example:

x = [[  1.,   2.,   3.,   4.],
     [  5.,   6.,   7.,   8.],
     [  9.,  10.,  11.,  12.]]
slice(x, begin=(0,1), end=(2,4)) = [[ 2.,  3.,  4.],
                                   [ 6.,  7.,  8.]]
slice(x, begin=(None, 0), end=(None, 3), step=(-1, 2)) = [[9., 11.],
                                                          [5.,  7.],
                                                          [1.,  3.]]

Defined in /home/smola/mxnet/src/operator/tensor/matrix_op.cc:L535

Parameters:
  • data (ndarray) – Source input

  • begin (tuple of <>, required) – starting indices for the slice operation, supports negative indices.

  • end (tuple of <>, required) – ending indices for the slice operation, supports negative indices.

  • step (tuple of <>, optional, default=[]) – step for the slice operation, supports negative values.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.slice_channel(data=None, num_outputs=_Null, axis=_Null, squeeze_axis=_Null, out=None, name=None, **kwargs)

Splits an array along a particular axis into multiple sub-arrays.

Note

SliceChannel is deprecated. Use split instead.

Note that num_outputs should evenly divide the length of the axis along which to split the array.

Example:

x  = [[[ 1.]
       [ 2.]]
      [[ 3.]
       [ 4.]]
      [[ 5.]
       [ 6.]]]
x.shape = (3, 2, 1)

y = split(x, axis=1, num_outputs=2) // a list of 2 arrays with shape (3, 1, 1)
y = [[[ 1.]]
     [[ 3.]]
     [[ 5.]]]

    [[[ 2.]]
     [[ 4.]]
     [[ 6.]]]

y[0].shape = (3, 1, 1)

z = split(x, axis=0, num_outputs=3) // a list of 3 arrays with shape (1, 2, 1)
z = [[[ 1.]
      [ 2.]]]

    [[[ 3.]
      [ 4.]]]

    [[[ 5.]
      [ 6.]]]

z[0].shape = (1, 2, 1)

squeeze_axis=1 removes the axis with length 1 from the shapes of the output arrays. Note that setting squeeze_axis to 1 removes axis with length 1 only along the axis which it is split. Also squeeze_axis can be set to true only if input.shape[axis] == num_outputs.

Example:

z = split(x, axis=0, num_outputs=3, squeeze_axis=1) // a list of 3 arrays with shape (2, 1)
z = [[ 1.]
     [ 2.]]

    [[ 3.]
     [ 4.]]

    [[ 5.]
     [ 6.]]
z[0].shape = (2 ,1 )

Defined in /home/smola/mxnet/src/operator/slice_channel.cc:L104

Parameters:
  • data (ndarray) – The input

  • num_outputs (int, required) – Number of splits. Note that this should evenly divide the length of the axis.

  • axis (int, optional, default='1') – Axis along which to split.

  • squeeze_axis (boolean, optional, default=0) – If true, Removes the axis with length 1 from the shapes of the output arrays. Note that setting squeeze_axis to true removes axis with length 1 only along the axis which it is split. Also squeeze_axis can be set to true only if input.shape[axis] == num_outputs.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.slice_like(data=None, shape_like=None, axes=_Null, out=None, name=None, **kwargs)

Slices a region of the array like the shape of another array. This function is similar to slice, however, the begin are always 0`s and `end of specific axes are inferred from the second input shape_like. Given the second shape_like input of shape=(d_0, d_1, ..., d_n-1), a slice_like operator with default empty axes, it performs the following operation: `` out = slice(input, begin=(0, 0, …, 0), end=(d_0, d_1, …, d_n-1))``. When axes is not empty, it is used to speficy which axes are being sliced. Given a 4-d input data, slice_like operator with axes=(0, 2, -1) will perform the following operation: `` out = slice(input, begin=(0, 0, 0, 0), end=(d_0, None, d_2, d_3))``. Note that it is allowed to have first and second input with different dimensions, however, you have to make sure the axes are specified and not exceeding the dimension limits. For example, given input_1 with shape=(2,3,4,5) and input_2 with shape=(1,2,3), it is not allowed to use: `` out = slice_like(a, b)`` because ndim of input_1 is 4, and ndim of input_2 is 3. The following is allowed in this situation: `` out = slice_like(a, b, axes=(0, 2))`` Example:

x = [[  1.,   2.,   3.,   4.],
     [  5.,   6.,   7.,   8.],
     [  9.,  10.,  11.,  12.]]
y = [[  0.,   0.,   0.],
     [  0.,   0.,   0.]]
slice_like(x, y) = [[ 1.,  2.,  3.]
                    [ 5.,  6.,  7.]]
slice_like(x, y, axes=(0, 1)) = [[ 1.,  2.,  3.]
                                 [ 5.,  6.,  7.]]
slice_like(x, y, axes=(0)) = [[ 1.,  2.,  3.,  4.]
                              [ 5.,  6.,  7.,  8.]]
slice_like(x, y, axes=(-1)) = [[  1.,   2.,   3.]
                               [  5.,   6.,   7.]
                               [  9.,  10.,  11.]]

Defined in /home/smola/mxnet/src/operator/tensor/matrix_op.cc:L681

Parameters:
  • data (ndarray) – Source input

  • shape_like (ndarray) – Shape like input

  • axes (Shape(tuple), optional, default=[]) – List of axes on which input data will be sliced according to the corresponding size of the second input. By default will slice on all axes. Negative axes are supported.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.smooth_l1(data=None, scalar=_Null, out=None, name=None, **kwargs)

Calculate Smooth L1 Loss(lhs, scalar) by summing

\[\begin{split}f(x) = \begin{cases} (\sigma x)^2/2,& \text{if }x < 1/\sigma^2\\ |x|-0.5/\sigma^2,& \text{otherwise} \end{cases}\end{split}\]

where \(x\) is an element of the tensor lhs and \(\sigma\) is the scalar.

Example:

smooth_l1([1, 2, 3, 4]) = [0.5, 1.5, 2.5, 3.5]
smooth_l1([1, 2, 3, 4], scalar=1) = [0.5, 1.5, 2.5, 3.5]

Defined in /home/smola/mxnet/src/operator/tensor/elemwise_binary_scalar_op_extended.cc:L138

Parameters:
  • data (ndarray) – source input

  • scalar (float) – scalar input

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.softmax(data, axis=-1, length=None, temperature=None, use_length=False, dtype=None)

Applies the softmax function.

The resulting array contains elements in the range (0,1) and the elements along the given axis sum up to 1.

\[softmax(\mathbf{z/t})_j = \frac{e^{z_j/t}}{\sum_{k=1}^K e^{z_k/t}}\]

for \(j = 1, ..., K\)

t is the temperature parameter in softmax function. By default, t equals 1.0

Parameters:
  • data (NDArray) – The input array.

  • axis (int, optional, default='-1') – The axis along which to compute softmax.

  • length (NDArray) – The length array.

  • temperature (double or None, optional, default=None) – Temperature parameter in softmax

  • dtype ({None, 'float16', 'float32', 'float64'},optional, default='None') – DType of the output in case this can’t be inferred. Defaults to the same as input’s dtype if not defined (dtype=None).

  • use_length (boolean or None, optional, default=0) – Whether to use the length input as a mask over the data input.

Returns:

out – The output of this function.

Return type:

NDArray or list of NDArrays

Example

>>> data = np.ones((2, 3))
>>> npx.softmax(data, axis=0)
array([[0.5, 0.5, 0.5],
    [0.5, 0.5, 0.5]])
>>> npx.softmax(data, axis=1)
array([[0.33333334, 0.33333334, 0.33333334],
    [0.33333334, 0.33333334, 0.33333334]])
mxnet.ndarray.numpy_extension.softsign(data=None, out=None, name=None, **kwargs)

Computes softsign of x element-wise.

\[y = x / (1 + abs(x))\]

The storage type of softsign output is always dense

Defined in /home/smola/mxnet/src/operator/tensor/elemwise_unary_op_basic.cc:L294

Parameters:
  • data (ndarray) – The input array.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.stop_gradient(data=None, out=None, name=None, **kwargs)

Stops gradient computation.

Stops the accumulated gradient of the inputs from flowing through this operator in the backward direction. In other words, this operator prevents the contribution of its inputs to be taken into account for computing gradients.

Example:

v1 = [1, 2]
v2 = [0, 1]
a = Variable('a')
b = Variable('b')
b_stop_grad = stop_gradient(3 * b)
loss = MakeLoss(b_stop_grad + a)

executor = loss.simple_bind(ctx=cpu(), a=(1,2), b=(1,2))
executor.forward(is_train=True, a=v1, b=v2)
executor.outputs
[ 1.  5.]

executor.backward()
executor.grad_arrays
[ 0.  0.]
[ 1.  1.]

Defined in /home/smola/mxnet/src/operator/tensor/elemwise_unary_op_basic.cc:L430

Parameters:
  • data (ndarray) – The input array.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.sync_batch_norm(data=None, gamma=None, beta=None, moving_mean=None, moving_var=None, eps=_Null, momentum=_Null, fix_gamma=_Null, use_global_stats=_Null, output_mean_var=_Null, ndev=_Null, key=_Null, out=None, name=None, **kwargs)

Batch normalization.

Normalizes a data batch by mean and variance, and applies a scale gamma as well as offset beta. Standard BN [1]_ implementation only normalize the data within each device. SyncBN normalizes the input within the whole mini-batch. We follow the sync-onece implmentation described in the paper [2].

Assume the input has more than one dimension and we normalize along axis 1. We first compute the mean and variance along this axis:

\[\begin{split}data\_mean[i] = mean(data[:,i,:,...]) \\ data\_var[i] = var(data[:,i,:,...])\end{split}\]

Then compute the normalized output, which has the same shape as input, as following:

\[out[:,i,:,...] = \frac{data[:,i,:,...] - data\_mean[i]}{\sqrt{data\_var[i]+\epsilon}} * gamma[i] + beta[i]\]

Both mean and var returns a scalar by treating the input as a vector.

Assume the input has size k on axis 1, then both gamma and beta have shape (k,). If output_mean_var is set to be true, then outputs both data_mean and data_var as well, which are needed for the backward pass.

Besides the inputs and the outputs, this operator accepts two auxiliary states, moving_mean and moving_var, which are k-length vectors. They are global statistics for the whole dataset, which are updated by:

moving_mean = moving_mean * momentum + data_mean * (1 - momentum)
moving_var = moving_var * momentum + data_var * (1 - momentum)

If use_global_stats is set to be true, then moving_mean and moving_var are used instead of data_mean and data_var to compute the output. It is often used during inference.

Both gamma and beta are learnable parameters. But if fix_gamma is true, then set gamma to 1 and its gradient to 0.

Reference:

Defined in /home/smola/mxnet/src/operator/contrib/sync_batch_norm.cc:L97

Parameters:
  • data (ndarray) – Input data to batch normalization

  • gamma (ndarray) – gamma array

  • beta (ndarray) – beta array

  • moving_mean (ndarray) – running mean of input

  • moving_var (ndarray) – running variance of input

  • eps (float, optional, default=0.00100000005) – Epsilon to prevent div 0

  • momentum (float, optional, default=0.899999976) – Momentum for moving average

  • fix_gamma (boolean, optional, default=1) – Fix gamma while training

  • use_global_stats (boolean, optional, default=0) – Whether use global moving statistics instead of local batch-norm. This will force change batch-norm into a scale shift operator.

  • output_mean_var (boolean, optional, default=0) – Output All,normal mean and var

  • ndev (int, optional, default='1') – The count of GPU devices

  • key (string, required) – Hash key for synchronization, please set the same hash key for same layer, Block.prefix is typically used as in gluon.nn.contrib.SyncBatchNorm.

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.tensor_poisson(lam=None, shape=_Null, dtype=_Null, out=None, name=None, **kwargs)

Concurrent sampling from multiple Poisson distributions with parameters lambda (rate).

The parameters of the distributions are provided as an input array. Let [s] be the shape of the input array, n be the dimension of [s], [t] be the shape specified as the parameter of the operator, and m be the dimension of [t]. Then the output will be a (n+m)-dimensional array with shape [s]x[t].

For any valid n-dimensional index i with respect to the input array, output[i] will be an m-dimensional array that holds randomly drawn samples from the distribution which is parameterized by the input value at index i. If the shape parameter of the operator is not set, then one sample will be drawn per distribution and the output array has the same shape as the input array.

Samples will always be returned as a floating point data type.

Examples:

lam = [ 1.0, 8.5 ]

// Draw a single sample for each distribution
sample_poisson(lam) = [  0.,  13.]

// Draw a vector containing two samples for each distribution
sample_poisson(lam, shape=(2)) = [[  0.,   4.],
                                  [ 13.,   8.]]

Defined in /home/smola/mxnet/src/operator/random/multisample_op.cc:L340

Parameters:
  • lam (ndarray) – Lambda (rate) parameters of the distributions.

  • shape (Shape(tuple), optional, default=[]) – Shape to be sampled from each random distribution.

  • dtype ({'None', 'float16', 'float32', 'float64'},optional, default='None') – DType of the output in case this can’t be inferred. Defaults to float32 if not defined (dtype=None).

  • out (ndarray, optional) – The output ndarray to hold the result.

Returns:

out – The output of this function.

Return type:

ndarray or list of ndarrays

mxnet.ndarray.numpy_extension.topk(data, axis=-1, k=1, ret_typ='indices', is_ascend=False, dtype='float32')
Returns the indices of the top k elements in an input array along the given

axis (by default). If ret_type is set to ‘value’ returns the value of top k elements (instead of indices). In case of ret_type = ‘both’, both value and index would be returned. The returned elements will be sorted.

Parameters:
  • data (NDArray) – The input array

  • axis (int or None, optional, default='-1') – Axis along which to choose the top k indices. If not given, the flattened array is used. Default is -1.

  • k (int, optional, default='1') – Number of top elements to select, should be always smaller than or equal to the element number in the given axis. A global sort is performed if set k < 1.

  • ret_typ ({'both', 'indices', 'mask', 'value'},optional, default='indices') – The return type. “value” means to return the top k values, “indices” means to return the indices of the top k values, “mask” means to return a mask array containing 0 and 1. 1 means the top k values. “both” means to return a list of both values and indices of top k elements.

  • is_ascend (boolean, optional, default=0) – Whether to choose k largest or k smallest elements. Top K largest elements will be chosen if set to false.

  • dtype ({'float16', 'float32', 'float64', 'int32', 'int64', 'uint8'},) – optional, default=’float32’ DType of the output indices when ret_typ is “indices” or “both”. An error will be raised if the selected data type cannot precisely represent the indices.

Returns:

out – The output of this function.

Return type:

NDArray or list of NDArrays

Example

>>> x = np.array([[0.3, 0.2, 0.4], [0.1, 0.3, 0.2]])

returns an index of the largest element on last axis

>>> npx.topk(x)
array([[2.],
       [1.]])

returns the value of top-2 largest elements on last axis

>>> npx.topk(x, ret_typ='value', k=2)
array([[0.4, 0.3],
       [0.3, 0.2]])

returns the value of top-2 smallest elements on last axis

>>> npx.topk(x, ret_typ='value', k=2, is_ascend=1)
array([[0.2, 0.3],
       [0.1, 0.2]])

returns the value of top-2 largest elements on axis 0

>>> npx.topk(x, axis=0, ret_typ='value', k=2)
array([[0.3, 0.3, 0.4],
       [0.1, 0.2, 0.2]])

flattens and then returns list of both values and indices

>>> npx.topk(x, ret_typ='both', k=2)
[array([[0.4, 0.3], [0.3, 0.2]]),
 array([[2., 0.], [1., 2.]])]
mxnet.ndarray.numpy_extension.while_loop(cond, func, loop_vars, max_iterations=None, name='while_loop')

Run a while loop with user-defined computation and loop condition.

This operator simulates a while loop which iterately does customized computation as long as the condition is satisfied.

loop_vars is a list of NDArrays on which the computation uses.

cond is a user-defined function, used as the loop condition. It consumes loop_vars, and produces a scalar MXNet NDArray, indicating the termination of the loop. The loop ends when cond returns false (zero). The cond is variadic, and its signature should be cond(*loop_vars) => NDArray.

func is a user-defined function, used as the loop body. It also consumes loop_vars, and produces step_output and new_loop_vars at each step. In each step, step_output should contain the same number elements. Through all steps, the i-th element of step_output should have the same shape and dtype. Also, new_loop_vars should contain the same number of elements as loop_vars, and the corresponding element should have the same shape and dtype. The func is variadic, and its signature should be func(*loop_vars) => (NDArray or nested List[NDArray] step_output, NDArray or nested List[NDArray] new_loop_vars).

max_iterations is a scalar that defines the maximum number of iterations allowed.

This function returns two lists. The first list has the length of |step_output|, in which the i-th element are all i-th elements of step_output from all steps, stacked along axis 0. The second list has the length of |loop_vars|, which represents final states of loop variables.

Warning

For now, the axis 0 of all NDArrays in the first list are max_iterations, due to lack of dynamic shape inference.

Warning

When cond is never satisfied, we assume step_output is empty, because it cannot be inferred. This is different from the symbolic version.

Parameters:
  • cond (a Python function.) – The loop condition.

  • func (a Python function.) – The loop body.

  • loop_vars (an NDArray or nested lists of NDArrays.) – The initial values of the loop variables.

  • max_iterations (a python int.) – Maximum number of iterations.

Returns:

  • outputs (an NDArray or nested lists of NDArrays) – stacked output from each step

  • states (an NDArray or nested lists of NDArrays) – final state

Examples

>>> cond = lambda i, s: i <= 5
>>> func = lambda i, s: ([i + s], [i + 1, s + i])
>>> loop_vars = (mx.np.array([0], dtype="int64"), mx.np.array([1], dtype="int64"))
>>> outputs, states = mx.npx.while_loop(cond, func, loop_vars, max_iterations=10)
>>> outputs
[array([[ 1],
       [ 2],
       [ 4],
       [ 7],
       [11],
       [16],
       [ 0],
       [ 0],
       [ 0],
       [ 0]], dtype=int64)]
>>> states
[array([6], dtype=int64), array([16], dtype=int64)]

Modules

control_flow

Namespace for registering control flow ops for imperative programming.

image

Image pre-processing operators.

random

Namespace for operators used in Gluon dispatched by F=ndarray.