mxnet.image.detection

Read images and perform augmentations for object detection.

Functions

CreateDetAugmenter(data_shape[, resize, ...])

Create augmenters for detection.

CreateMultiRandCropAugmenter([...])

Helper function to create multiple random crop augmenters.

Classes

DetAugmenter(**kwargs)

Detection base augmenter

DetBorrowAug(augmenter)

Borrow standard augmenter from image classification.

DetHorizontalFlipAug(p)

Random horizontal flipping.

DetRandomCropAug([min_object_covered, ...])

Random cropping with constraints

DetRandomPadAug([aspect_ratio_range, ...])

Random padding augmenter.

DetRandomSelectAug(aug_list[, skip_prob])

Randomly select one augmenter to apply, with chance to skip all.

ImageDetIter(batch_size, data_shape[, ...])

Image iterator with a large number of augmentation choices for detection.

mxnet.image.detection.CreateDetAugmenter(data_shape, resize=0, rand_crop=0, rand_pad=0, rand_gray=0, rand_mirror=False, mean=None, std=None, brightness=0, contrast=0, saturation=0, pca_noise=0, hue=0, inter_method=2, min_object_covered=0.1, aspect_ratio_range=(0.75, 1.33), area_range=(0.05, 3.0), min_eject_coverage=0.3, max_attempts=50, pad_val=(127, 127, 127))[source]

Create augmenters for detection.

Parameters:
  • data_shape (tuple of int) – Shape for output data

  • resize (int) – Resize shorter edge if larger than 0 at the begining

  • rand_crop (float) – [0, 1], probability to apply random cropping

  • rand_pad (float) – [0, 1], probability to apply random padding

  • rand_gray (float) – [0, 1], probability to convert to grayscale for all channels

  • rand_mirror (bool) – Whether to apply horizontal flip to image with probability 0.5

  • mean (np.ndarray or None) – Mean pixel values for [r, g, b]

  • std (np.ndarray or None) – Standard deviations for [r, g, b]

  • brightness (float) – Brightness jittering range (percent)

  • contrast (float) – Contrast jittering range (percent)

  • saturation (float) – Saturation jittering range (percent)

  • hue (float) – Hue jittering range (percent)

  • pca_noise (float) – Pca noise level (percent)

  • inter_method (int, default=2(Area-based)) –

    Interpolation method for all resizing operations

    Possible values: 0: Nearest Neighbors Interpolation. 1: Bilinear interpolation. 2: Area-based (resampling using pixel area relation). It may be a preferred method for image decimation, as it gives moire-free results. But when the image is zoomed, it is similar to the Nearest Neighbors method. (used by default). 3: Bicubic interpolation over 4x4 pixel neighborhood. 4: Lanczos interpolation over 8x8 pixel neighborhood. 9: Cubic for enlarge, area for shrink, bilinear for others 10: Random select from interpolation method metioned above. Note: When shrinking an image, it will generally look best with AREA-based interpolation, whereas, when enlarging an image, it will generally look best with Bicubic (slow) or Bilinear (faster but still looks OK).

  • min_object_covered (float) – The cropped area of the image must contain at least this fraction of any bounding box supplied. The value of this parameter should be non-negative. In the case of 0, the cropped area does not need to overlap any of the bounding boxes supplied.

  • min_eject_coverage (float) – The minimum coverage of cropped sample w.r.t its original size. With this constraint, objects that have marginal area after crop will be discarded.

  • aspect_ratio_range (tuple of floats) – The cropped area of the image must have an aspect ratio = width / height within this range.

  • area_range (tuple of floats) – The cropped area of the image must contain a fraction of the supplied image within in this range.

  • max_attempts (int) – Number of attempts at generating a cropped/padded region of the image of the specified constraints. After max_attempts failures, return the original image.

  • pad_val (float) – Pixel value to be filled when padding is enabled. pad_val will automatically be subtracted by mean and divided by std if applicable.

Examples

>>> # An example of creating multiple augmenters
>>> augs = mx.image.CreateDetAugmenter(data_shape=(3, 300, 300), rand_crop=0.5,
...    rand_pad=0.5, rand_mirror=True, mean=True, brightness=0.125, contrast=0.125,
...    saturation=0.125, pca_noise=0.05, inter_method=10, min_object_covered=[0.3, 0.5, 0.9],
...    area_range=(0.3, 3.0))
>>> # dump the details
>>> for aug in augs:
...    aug.dumps()
mxnet.image.detection.CreateMultiRandCropAugmenter(min_object_covered=0.1, aspect_ratio_range=(0.75, 1.33), area_range=(0.05, 1.0), min_eject_coverage=0.3, max_attempts=50, skip_prob=0)[source]

Helper function to create multiple random crop augmenters.

Parameters:
  • min_object_covered (float or list of float, default=0.1) – The cropped area of the image must contain at least this fraction of any bounding box supplied. The value of this parameter should be non-negative. In the case of 0, the cropped area does not need to overlap any of the bounding boxes supplied.

  • min_eject_coverage (float or list of float, default=0.3) – The minimum coverage of cropped sample w.r.t its original size. With this constraint, objects that have marginal area after crop will be discarded.

  • aspect_ratio_range (tuple of floats or list of tuple of floats, default=(0.75, 1.33)) – The cropped area of the image must have an aspect ratio = width / height within this range.

  • area_range (tuple of floats or list of tuple of floats, default=(0.05, 1.0)) – The cropped area of the image must contain a fraction of the supplied image within in this range.

  • max_attempts (int or list of int, default=50) – Number of attempts at generating a cropped/padded region of the image of the specified constraints. After max_attempts failures, return the original image.

Examples

>>> # An example of creating multiple random crop augmenters
>>> min_object_covered = [0.1, 0.3, 0.5, 0.7, 0.9]  # use 5 augmenters
>>> aspect_ratio_range = (0.75, 1.33)  # use same range for all augmenters
>>> area_range = [(0.1, 1.0), (0.2, 1.0), (0.2, 1.0), (0.3, 0.9), (0.5, 1.0)]
>>> min_eject_coverage = 0.3
>>> max_attempts = 50
>>> aug = mx.image.det.CreateMultiRandCropAugmenter(min_object_covered=min_object_covered,
        aspect_ratio_range=aspect_ratio_range, area_range=area_range,
        min_eject_coverage=min_eject_coverage, max_attempts=max_attempts,
        skip_prob=0)
>>> aug.dumps()  # show some details
class mxnet.image.detection.DetAugmenter(**kwargs)[source]

Bases: object

Detection base augmenter

dumps()[source]

Saves the Augmenter to string

Returns:

JSON formatted string that describes the Augmenter.

Return type:

str

class mxnet.image.detection.DetBorrowAug(augmenter)[source]

Bases: DetAugmenter

Borrow standard augmenter from image classification. Which is good once you know label won’t be affected after this augmenter.

Parameters:

augmenter (mx.image.Augmenter) – The borrowed standard augmenter which has no effect on label

dumps()[source]

Override the default one to avoid duplicate dump.

class mxnet.image.detection.DetHorizontalFlipAug(p)[source]

Bases: DetAugmenter

Random horizontal flipping.

Parameters:

p (float) – chance [0, 1] to flip

class mxnet.image.detection.DetRandomCropAug(min_object_covered=0.1, aspect_ratio_range=(0.75, 1.33), area_range=(0.05, 1.0), min_eject_coverage=0.3, max_attempts=50)[source]

Bases: DetAugmenter

Random cropping with constraints

Parameters:
  • min_object_covered (float, default=0.1) – The cropped area of the image must contain at least this fraction of any bounding box supplied. The value of this parameter should be non-negative. In the case of 0, the cropped area does not need to overlap any of the bounding boxes supplied.

  • min_eject_coverage (float, default=0.3) – The minimum coverage of cropped sample w.r.t its original size. With this constraint, objects that have marginal area after crop will be discarded.

  • aspect_ratio_range (tuple of floats, default=(0.75, 1.33)) – The cropped area of the image must have an aspect ratio = width / height within this range.

  • area_range (tuple of floats, default=(0.05, 1.0)) – The cropped area of the image must contain a fraction of the supplied image within in this range.

  • max_attempts (int, default=50) – Number of attempts at generating a cropped/padded region of the image of the specified constraints. After max_attempts failures, return the original image.

class mxnet.image.detection.DetRandomPadAug(aspect_ratio_range=(0.75, 1.33), area_range=(1.0, 3.0), max_attempts=50, pad_val=(128, 128, 128))[source]

Bases: DetAugmenter

Random padding augmenter.

Parameters:
  • aspect_ratio_range (tuple of floats, default=(0.75, 1.33)) – The padded area of the image must have an aspect ratio = width / height within this range.

  • area_range (tuple of floats, default=(1.0, 3.0)) – The padded area of the image must be larger than the original area

  • max_attempts (int, default=50) – Number of attempts at generating a padded region of the image of the specified constraints. After max_attempts failures, return the original image.

  • pad_val (float or tuple of float, default=(128, 128, 128)) – pixel value to be filled when padding is enabled.

class mxnet.image.detection.DetRandomSelectAug(aug_list, skip_prob=0)[source]

Bases: DetAugmenter

Randomly select one augmenter to apply, with chance to skip all.

Parameters:
  • aug_list (list of DetAugmenter) – The random selection will be applied to one of the augmenters

  • skip_prob (float) – The probability to skip all augmenters and return input directly

dumps()[source]

Override default.

class mxnet.image.detection.ImageDetIter(batch_size, data_shape, path_imgrec=None, path_imglist=None, path_root=None, path_imgidx=None, shuffle=False, part_index=0, num_parts=1, aug_list=None, imglist=None, data_name='data', label_name='label', last_batch_handle='pad', **kwargs)[source]

Bases: ImageIter

Image iterator with a large number of augmentation choices for detection.

Parameters:
  • aug_list (list or None) – Augmenter list for generating distorted images

  • batch_size (int) – Number of examples per batch.

  • data_shape (tuple) – Data shape in (channels, height, width) format. For now, only RGB image with 3 channels is supported.

  • path_imgrec (str) – Path to image record file (.rec). Created with tools/im2rec.py or bin/im2rec.

  • path_imglist (str) – Path to image list (.lst). Created with tools/im2rec.py or with custom script. Format: Tab separated record of index, one or more labels and relative_path_from_root.

  • imglist (list) – A list of images with the label(s). Each item is a list [imagelabel: float or list of float, imgpath].

  • path_root (str) – Root folder of image files.

  • path_imgidx (str) – Path to image index file. Needed for partition and shuffling when using .rec source.

  • shuffle (bool) – Whether to shuffle all images at the start of each iteration or not. Can be slow for HDD.

  • part_index (int) – Partition index.

  • num_parts (int) – Total number of partitions.

  • data_name (str) – Data name for provided symbols.

  • label_name (str) – Name for detection labels

  • last_batch_handle (str, optional) – How to handle the last batch. This parameter can be ‘pad’(default), ‘discard’ or ‘roll_over’. If ‘pad’, the last batch will be padded with data starting from the begining If ‘discard’, the last batch will be discarded If ‘roll_over’, the remaining elements will be rolled over to the next iteration

  • kwargs – More arguments for creating augmenter. See mx.image.CreateDetAugmenter.

augmentation_transform(data, label)[source]

Override Transforms input data with specified augmentations.

check_label_shape(label_shape)[source]

Checks if the new label shape is valid

draw_next(color=None, thickness=2, mean=None, std=None, clip=True, waitKey=None, window_name='draw_next', id2labels=None)[source]

Display next image with bounding boxes drawn.

Parameters:
  • color (tuple) – Bounding box color in RGB, use None for random color

  • thickness (int) – Bounding box border thickness

  • mean (True or numpy.ndarray) – Compensate for the mean to have better visual effect

  • std (True or numpy.ndarray) – Revert standard deviations

  • clip (bool) – If true, clip to [0, 255] for better visual effect

  • waitKey (None or int) – Hold the window for waitKey milliseconds if set, skip ploting if None

  • window_name (str) – Plot window name if waitKey is set.

  • id2labels (dict) – Mapping of labels id to labels name.

Returns:

numpy.ndarray

Examples

>>> # use draw_next to get images with bounding boxes drawn
>>> iterator = mx.image.ImageDetIter(1, (3, 600, 600), path_imgrec='train.rec')
>>> for image in iterator.draw_next(waitKey=None):
...     # display image
>>> # or let draw_next display using cv2 module
>>> for image in iterator.draw_next(waitKey=0, window_name='disp'):
...     pass
next()[source]

Override the function for returning next batch.

reshape(data_shape=None, label_shape=None)[source]

Reshape iterator for data_shape or label_shape.

Parameters:
  • data_shape (tuple or None) – Reshape the data_shape to the new shape if not None

  • label_shape (tuple or None) – Reshape label shape to new shape if not None

sync_label_shape(it, verbose=False)[source]

Synchronize label shape with the input iterator. This is useful when train/validation iterators have different label padding.

Parameters:
  • it (ImageDetIter) – The other iterator to synchronize

  • verbose (bool) – Print verbose log if true

Returns:

The synchronized other iterator, the internal label shape is updated as well.

Return type:

ImageDetIter

Examples

>>> train_iter = mx.image.ImageDetIter(32, (3, 300, 300), path_imgrec='train.rec')
>>> val_iter = mx.image.ImageDetIter(32, (3, 300, 300), path.imgrec='val.rec')
>>> train_iter.label_shape
(30, 6)
>>> val_iter.label_shape
(25, 6)
>>> val_iter = train_iter.sync_label_shape(val_iter, verbose=False)
>>> train_iter.label_shape
(30, 6)
>>> val_iter.label_shape
(30, 6)