mxnet.image.detection¶

Read images and perform augmentations for object detection.

Functions

`CreateDetAugmenter`(data_shape[, resize, ...])	Create augmenters for detection.
`CreateMultiRandCropAugmenter`([...])	Helper function to create multiple random crop augmenters.

Classes

`DetAugmenter`(**kwargs)	Detection base augmenter
`DetBorrowAug`(augmenter)	Borrow standard augmenter from image classification.
`DetHorizontalFlipAug`(p)	Random horizontal flipping.
`DetRandomCropAug`([min_object_covered, ...])	Random cropping with constraints
`DetRandomPadAug`([aspect_ratio_range, ...])	Random padding augmenter.
`DetRandomSelectAug`(aug_list[, skip_prob])	Randomly select one augmenter to apply, with chance to skip all.
`ImageDetIter`(batch_size, data_shape[, ...])	Image iterator with a large number of augmentation choices for detection.

mxnet.image.detection.CreateDetAugmenter(data_shape, resize=0, rand_crop=0, rand_pad=0, rand_gray=0, rand_mirror=False, mean=None, std=None, brightness=0, contrast=0, saturation=0, pca_noise=0, hue=0, inter_method=2, min_object_covered=0.1, aspect_ratio_range=(0.75, 1.33), area_range=(0.05, 3.0), min_eject_coverage=0.3, max_attempts=50, pad_val=(127, 127, 127))[source]¶

Create augmenters for detection.

Parameters:

data_shape (tuple of int) – Shape for output data
resize (int) – Resize shorter edge if larger than 0 at the begining
rand_crop (float) – [0, 1], probability to apply random cropping
rand_pad (float) – [0, 1], probability to apply random padding
rand_gray (float) – [0, 1], probability to convert to grayscale for all channels
rand_mirror (bool) – Whether to apply horizontal flip to image with probability 0.5
mean (np.ndarray or None) – Mean pixel values for [r, g, b]
std (np.ndarray or None) – Standard deviations for [r, g, b]
brightness (float) – Brightness jittering range (percent)
contrast (float) – Contrast jittering range (percent)
saturation (float) – Saturation jittering range (percent)
hue (float) – Hue jittering range (percent)
pca_noise (float) – Pca noise level (percent)
inter_method (int, default=2(Area-based)) –
Interpolation method for all resizing operations

Possible values: 0: Nearest Neighbors Interpolation. 1: Bilinear interpolation. 2: Area-based (resampling using pixel area relation). It may be a preferred method for image decimation, as it gives moire-free results. But when the image is zoomed, it is similar to the Nearest Neighbors method. (used by default). 3: Bicubic interpolation over 4x4 pixel neighborhood. 4: Lanczos interpolation over 8x8 pixel neighborhood. 9: Cubic for enlarge, area for shrink, bilinear for others 10: Random select from interpolation method metioned above. Note: When shrinking an image, it will generally look best with AREA-based interpolation, whereas, when enlarging an image, it will generally look best with Bicubic (slow) or Bilinear (faster but still looks OK).
min_object_covered (float) – The cropped area of the image must contain at least this fraction of any bounding box supplied. The value of this parameter should be non-negative. In the case of 0, the cropped area does not need to overlap any of the bounding boxes supplied.
min_eject_coverage (float) – The minimum coverage of cropped sample w.r.t its original size. With this constraint, objects that have marginal area after crop will be discarded.
aspect_ratio_range (tuple of floats) – The cropped area of the image must have an aspect ratio = width / height within this range.
area_range (tuple of floats) – The cropped area of the image must contain a fraction of the supplied image within in this range.
max_attempts (int) – Number of attempts at generating a cropped/padded region of the image of the specified constraints. After max_attempts failures, return the original image.
pad_val (float) – Pixel value to be filled when padding is enabled. pad_val will automatically be subtracted by mean and divided by std if applicable.

Examples

>>> # An example of creating multiple augmenters
>>> augs = mx.image.CreateDetAugmenter(data_shape=(3, 300, 300), rand_crop=0.5,
...    rand_pad=0.5, rand_mirror=True, mean=True, brightness=0.125, contrast=0.125,
...    saturation=0.125, pca_noise=0.05, inter_method=10, min_object_covered=[0.3, 0.5, 0.9],
...    area_range=(0.3, 3.0))
>>> # dump the details
>>> for aug in augs:
...    aug.dumps()

mxnet.image.detection.CreateMultiRandCropAugmenter(min_object_covered=0.1, aspect_ratio_range=(0.75, 1.33), area_range=(0.05, 1.0), min_eject_coverage=0.3, max_attempts=50, skip_prob=0)[source]¶

Helper function to create multiple random crop augmenters.

Parameters:

min_object_covered (float or list of float, default=0.1) – The cropped area of the image must contain at least this fraction of any bounding box supplied. The value of this parameter should be non-negative. In the case of 0, the cropped area does not need to overlap any of the bounding boxes supplied.
min_eject_coverage (float or list of float, default=0.3) – The minimum coverage of cropped sample w.r.t its original size. With this constraint, objects that have marginal area after crop will be discarded.
aspect_ratio_range (tuple of floats or list of tuple of floats, default=(0.75, 1.33)) – The cropped area of the image must have an aspect ratio = width / height within this range.
area_range (tuple of floats or list of tuple of floats, default=(0.05, 1.0)) – The cropped area of the image must contain a fraction of the supplied image within in this range.
max_attempts (int or list of int, default=50) – Number of attempts at generating a cropped/padded region of the image of the specified constraints. After max_attempts failures, return the original image.

Examples

>>> # An example of creating multiple random crop augmenters
>>> min_object_covered = [0.1, 0.3, 0.5, 0.7, 0.9]  # use 5 augmenters
>>> aspect_ratio_range = (0.75, 1.33)  # use same range for all augmenters
>>> area_range = [(0.1, 1.0), (0.2, 1.0), (0.2, 1.0), (0.3, 0.9), (0.5, 1.0)]
>>> min_eject_coverage = 0.3
>>> max_attempts = 50
>>> aug = mx.image.det.CreateMultiRandCropAugmenter(min_object_covered=min_object_covered,
        aspect_ratio_range=aspect_ratio_range, area_range=area_range,
        min_eject_coverage=min_eject_coverage, max_attempts=max_attempts,
        skip_prob=0)
>>> aug.dumps()  # show some details

class mxnet.image.detection.DetAugmenter(**kwargs)[source]¶

Bases: object

Detection base augmenter

dumps()[source]¶

Saves the Augmenter to string

Returns:: JSON formatted string that describes the Augmenter.
Return type:: str

class mxnet.image.detection.DetBorrowAug(augmenter)[source]¶

Bases: DetAugmenter

Borrow standard augmenter from image classification. Which is good once you know label won’t be affected after this augmenter.

Parameters:: augmenter (mx.image.Augmenter) – The borrowed standard augmenter which has no effect on label

dumps()[source]¶: Override the default one to avoid duplicate dump.

class mxnet.image.detection.DetHorizontalFlipAug(p)[source]¶

Bases: DetAugmenter

Random horizontal flipping.

Parameters:: p (float) – chance [0, 1] to flip

class mxnet.image.detection.DetRandomCropAug(min_object_covered=0.1, aspect_ratio_range=(0.75, 1.33), area_range=(0.05, 1.0), min_eject_coverage=0.3, max_attempts=50)[source]¶

Bases: DetAugmenter

Random cropping with constraints

Parameters:

min_object_covered (float, default=0.1) – The cropped area of the image must contain at least this fraction of any bounding box supplied. The value of this parameter should be non-negative. In the case of 0, the cropped area does not need to overlap any of the bounding boxes supplied.
min_eject_coverage (float, default=0.3) – The minimum coverage of cropped sample w.r.t its original size. With this constraint, objects that have marginal area after crop will be discarded.
aspect_ratio_range (tuple of floats, default=(0.75, 1.33)) – The cropped area of the image must have an aspect ratio = width / height within this range.
area_range (tuple of floats, default=(0.05, 1.0)) – The cropped area of the image must contain a fraction of the supplied image within in this range.
max_attempts (int, default=50) – Number of attempts at generating a cropped/padded region of the image of the specified constraints. After max_attempts failures, return the original image.

class mxnet.image.detection.DetRandomPadAug(aspect_ratio_range=(0.75, 1.33), area_range=(1.0, 3.0), max_attempts=50, pad_val=(128, 128, 128))[source]¶

Bases: DetAugmenter

Random padding augmenter.

Parameters:

aspect_ratio_range (tuple of floats, default=(0.75, 1.33)) – The padded area of the image must have an aspect ratio = width / height within this range.
area_range (tuple of floats, default=(1.0, 3.0)) – The padded area of the image must be larger than the original area
max_attempts (int, default=50) – Number of attempts at generating a padded region of the image of the specified constraints. After max_attempts failures, return the original image.
pad_val (float or tuple of float, default=(128, 128, 128)) – pixel value to be filled when padding is enabled.

class mxnet.image.detection.DetRandomSelectAug(aug_list, skip_prob=0)[source]¶

Bases: DetAugmenter

Randomly select one augmenter to apply, with chance to skip all.

Parameters:

aug_list (list of DetAugmenter) – The random selection will be applied to one of the augmenters
skip_prob (float) – The probability to skip all augmenters and return input directly

dumps()[source]¶: Override default.

class mxnet.image.detection.ImageDetIter(batch_size, data_shape, path_imgrec=None, path_imglist=None, path_root=None, path_imgidx=None, shuffle=False, part_index=0, num_parts=1, aug_list=None, imglist=None, data_name='data', label_name='label', last_batch_handle='pad', **kwargs)[source]¶

Bases: ImageIter

Image iterator with a large number of augmentation choices for detection.

Parameters:

aug_list (list or None) – Augmenter list for generating distorted images
batch_size (int) – Number of examples per batch.
data_shape (tuple) – Data shape in (channels, height, width) format. For now, only RGB image with 3 channels is supported.
path_imgrec (str) – Path to image record file (.rec). Created with tools/im2rec.py or bin/im2rec.
path_imglist (str) – Path to image list (.lst). Created with tools/im2rec.py or with custom script. Format: Tab separated record of index, one or more labels and relative_path_from_root.
imglist (list) – A list of images with the label(s). Each item is a list [imagelabel: float or list of float, imgpath].
path_root (str) – Root folder of image files.
path_imgidx (str) – Path to image index file. Needed for partition and shuffling when using .rec source.
shuffle (bool) – Whether to shuffle all images at the start of each iteration or not. Can be slow for HDD.
part_index (int) – Partition index.
num_parts (int) – Total number of partitions.
data_name (str) – Data name for provided symbols.
label_name (str) – Name for detection labels
last_batch_handle (str, optional) – How to handle the last batch. This parameter can be ‘pad’(default), ‘discard’ or ‘roll_over’. If ‘pad’, the last batch will be padded with data starting from the begining If ‘discard’, the last batch will be discarded If ‘roll_over’, the remaining elements will be rolled over to the next iteration
kwargs – More arguments for creating augmenter. See mx.image.CreateDetAugmenter.

augmentation_transform(data, label)[source]¶: Override Transforms input data with specified augmentations.

check_label_shape(label_shape)[source]¶: Checks if the new label shape is valid

draw_next(color=None, thickness=2, mean=None, std=None, clip=True, waitKey=None, window_name='draw_next', id2labels=None)[source]¶

Display next image with bounding boxes drawn.

Parameters:

color (tuple) – Bounding box color in RGB, use None for random color
thickness (int) – Bounding box border thickness
mean (True or numpy.ndarray) – Compensate for the mean to have better visual effect
std (True or numpy.ndarray) – Revert standard deviations
clip (bool) – If true, clip to [0, 255] for better visual effect
waitKey (None or int) – Hold the window for waitKey milliseconds if set, skip ploting if None
window_name (str) – Plot window name if waitKey is set.
id2labels (dict) – Mapping of labels id to labels name.

Returns:

numpy.ndarray

Examples

>>> # use draw_next to get images with bounding boxes drawn
>>> iterator = mx.image.ImageDetIter(1, (3, 600, 600), path_imgrec='train.rec')
>>> for image in iterator.draw_next(waitKey=None):
...     # display image
>>> # or let draw_next display using cv2 module
>>> for image in iterator.draw_next(waitKey=0, window_name='disp'):
...     pass

next()[source]¶: Override the function for returning next batch.

reshape(data_shape=None, label_shape=None)[source]¶

Reshape iterator for data_shape or label_shape.

Parameters:

data_shape (tuple or None) – Reshape the data_shape to the new shape if not None
label_shape (tuple or None) – Reshape label shape to new shape if not None

sync_label_shape(it, verbose=False)[source]¶

Synchronize label shape with the input iterator. This is useful when train/validation iterators have different label padding.

Parameters:

it (ImageDetIter) – The other iterator to synchronize
verbose (bool) – Print verbose log if true

Returns:

The synchronized other iterator, the internal label shape is updated as well.

Return type:

ImageDetIter

Examples

>>> train_iter = mx.image.ImageDetIter(32, (3, 300, 300), path_imgrec='train.rec')
>>> val_iter = mx.image.ImageDetIter(32, (3, 300, 300), path.imgrec='val.rec')
>>> train_iter.label_shape
(30, 6)
>>> val_iter.label_shape
(25, 6)
>>> val_iter = train_iter.sync_label_shape(val_iter, verbose=False)
>>> train_iter.label_shape
(30, 6)
>>> val_iter.label_shape
(30, 6)