tvm.relay.vision#

Vision network related operators.

Functions:

all_class_non_max_suppression(boxes, scores)

Non-maximum suppression operator for object detection, corresponding to ONNX NonMaxSuppression and TensorFlow combined_non_max_suppression.

get_valid_counts(data, score_threshold[, ...])

Get valid count of bounding boxes given a score threshold.

multibox_prior(data[, sizes, ratios, steps, ...])

Generate prior(anchor) boxes from data, sizes and ratios.

multibox_transform_loc(cls_prob, loc_pred, ...)

Location transformation for multibox detection

non_max_suppression(data, valid_count, indices)

Non-maximum suppression operator for object detection.

proposal(cls_prob, bbox_pred, im_info, ...)

Proposal operator.

regular_non_max_suppression(boxes, scores, ...)

Regular non-maximum suppression operator for object detection, corresponding to TFLite's regular NMS.

roi_align(data, rois, pooled_size, spatial_scale)

ROI align operator.

roi_pool(data, rois, pooled_size, spatial_scale)

ROI pool operator.

yolo_reorg(data, stride)

Yolo reorg operation used in darknet models.

tvm.relay.vision.all_class_non_max_suppression(boxes, scores, max_output_boxes_per_class=-1, iou_threshold=-1.0, score_threshold=-1.0, output_format='onnx')#

Non-maximum suppression operator for object detection, corresponding to ONNX NonMaxSuppression and TensorFlow combined_non_max_suppression. NMS is performed for each class separately.

Parameters#

boxesrelay.Expr

3-D tensor with shape (batch_size, num_boxes, 4)

scores: relay.Expr

3-D tensor with shape (batch_size, num_classes, num_boxes)

max_output_boxes_per_classint or relay.Expr, optional

The maxinum number of output selected boxes per class

iou_thresholdfloat or relay.Expr, optionaIl

IoU test threshold

score_thresholdfloat or relay.Expr, optional

Score threshold to filter out low score boxes early

output_formatstring, optional

“onnx” or “tensorflow”. Specify by which frontends the outputs are intented to be consumed.

Returns#

outrelay.Tuple

If output_format is “onnx”, the output is a relay.Tuple of two tensors, the first is indices of size (batch_size * num_class* num_boxes , 3) and the second is a scalar tensor num_total_detection of shape (1,) representing the total number of selected boxes. The three values in indices encode batch, class, and box indices. Rows of indices are ordered such that selected boxes from batch 0, class 0 come first, in descending of scores, followed by boxes from batch 0, class 1 etc. Out of batch_size * num_class* num_boxes rows of indices, only the first num_total_detection rows are valid.

If output_format is “tensorflow”, the output is a relay.Tuple of three tensors, the first is indices of size (batch_size, num_class * num_boxes , 2), the second is scores of size (batch_size, num_class * num_boxes), and the third is num_total_detection of size (batch_size,) representing the total number of selected boxes per batch. The two values in indices encode class and box indices. Of num_class * num_boxes boxes in indices at batch b, only the first num_total_detection[b] entries are valid. The second axis of indices and scores are sorted within each class by box scores, but not across classes. So the box indices and scores for the class 0 come first in a sorted order, followed by the class 1 etc.

tvm.relay.vision.get_valid_counts(data, score_threshold, id_index=0, score_index=1)#

Get valid count of bounding boxes given a score threshold. Also moves valid boxes to the top of input data.

Parameters#

datarelay.Expr

Input data. 3-D tensor with shape [batch_size, num_anchors, 6].

score_thresholdoptional, float

Lower limit of score for valid bounding boxes.

id_indexoptional, int

index of the class categories, -1 to disable.

score_index: optional, int

Index of the scores/confidence of boxes.

Returns#

valid_countrelay.Expr

1-D tensor for valid number of boxes.

out_tensorrelay.Expr

Rearranged data tensor.

out_indices: relay.Expr

Indices in input data

tvm.relay.vision.multibox_prior(data, sizes=(1.0,), ratios=(1.0,), steps=(-1.0, -1.0), offsets=(0.5, 0.5), clip=False)#

Generate prior(anchor) boxes from data, sizes and ratios.

Parameters#

datarelay.Expr

The input data tensor.

sizestuple of float, optional

Tuple of sizes for anchor boxes.

ratiostuple of float, optional

Tuple of ratios for anchor boxes.

stepsTuple of float, optional

Priorbox step across y and x, -1 for auto calculation.

offsetstuple of int, optional

Priorbox center offsets, y and x respectively.

clipboolean, optional

Whether to clip out-of-boundary boxes.

Returns#

outrelay.Expr

3-D tensor with shape [1, h_in * w_in * (num_sizes + num_ratios - 1), 4]

tvm.relay.vision.multibox_transform_loc(cls_prob, loc_pred, anchor, clip=True, threshold=0.01, variances=(0.1, 0.1, 0.2, 0.2), keep_background=False)#

Location transformation for multibox detection

Parameters#

cls_probtvm.relay.Expr

Class probabilities.

loc_predtvm.relay.Expr

Location regression predictions.

anchortvm.relay.Expr

Prior anchor boxes.

clipboolean, optional

Whether to clip out-of-boundary boxes.

thresholddouble, optional

Threshold to be a positive prediction.

variancesTuple of float, optional

variances to be decoded from box regression output.

keep_backgroundboolean, optional

Whether to keep boxes detected as background or not.

Returns#

ret : tuple of tvm.relay.Expr

tvm.relay.vision.non_max_suppression(data, valid_count, indices, max_output_size=-1, iou_threshold=0.5, force_suppress=False, top_k=-1, coord_start=2, score_index=1, id_index=0, return_indices=True, invalid_to_bottom=False)#

Non-maximum suppression operator for object detection.

Parameters#

datarelay.Expr

3-D tensor with shape [batch_size, num_anchors, 6] or [batch_size, num_anchors, 5]. The last dimension should be in format of [class_id, score, box_left, box_top, box_right, box_bottom] or [score, box_left, box_top, box_right, box_bottom]. It could be the second output out_tensor of get_valid_counts.

valid_countrelay.Expr

1-D tensor for valid number of boxes. It could be the output valid_count of get_valid_counts.

indices: relay.Expr

2-D tensor with shape [batch_size, num_anchors], represents the index of box in original data. It could be the third output out_indices of get_valid_counts. The values in the second dimension are like the output of arange(num_anchors) if get_valid_counts is not used before non_max_suppression.

max_output_sizeint or relay.Expr, optional

Max number of output valid boxes for each instance. Return all valid boxes if the value of max_output_size is less than 0.

iou_thresholdfloat or relay.Expr, optional

Non-maximum suppression threshold.

force_suppressbool, optional

Suppress all detections regardless of class_id.

top_kint, optional

Keep maximum top k detections before nms, -1 for no limit.

coord_startint, optional

The starting index of the consecutive 4 coordinates.

score_indexint, optional

Index of the scores/confidence of boxes.

id_indexint, optional

index of the class categories, -1 to disable.

return_indicesbool, optional

Whether to return box indices in input data.

invalid_to_bottombool, optional

Whether to move all valid bounding boxes to the top.

Returns#

outrelay.Expr or relay.Tuple

return relay.Expr if return_indices is disabled, a 3-D tensor with shape [batch_size, num_anchors, 6] or [batch_size, num_anchors, 5]. If return_indices is True, return relay.Tuple of two 2-D tensors, with shape [batch_size, num_anchors] and [batch_size, num_valid_anchors] respectively.

tvm.relay.vision.proposal(cls_prob, bbox_pred, im_info, scales, ratios, feature_stride, threshold, rpn_pre_nms_top_n, rpn_post_nms_top_n, rpn_min_size, iou_loss)#

Proposal operator.

Parameters#

cls_probrelay.Expr

4-D tensor with shape [batch, 2 * num_anchors, height, width].

bbox_predrelay.Expr

4-D tensor with shape [batch, 4 * num_anchors, height, width].

im_inforelay.Expr

2-D tensor with shape [batch, 3]. The last dimension should be in format of [im_height, im_width, im_scale]

scaleslist/tuple of float

Scales of anchor windows.

ratioslist/tuple of float

Ratios of anchor windows.

feature_strideint

The size of the receptive field each unit in the convolution layer of the rpn, for example the product of all stride’s prior to this layer.

thresholdfloat

Non-maximum suppression threshold.

rpn_pre_nms_top_nint

Number of top scoring boxes to apply NMS. -1 to use all boxes.

rpn_post_nms_top_nint

Number of top scoring boxes to keep after applying NMS to RPN proposals.

rpn_min_sizeint

Minimum height or width in proposal.

iou_lossbool

Usage of IoU loss.

Returns#

outputrelay.Expr

2-D tensor with shape [batch * rpn_post_nms_top_n, 5]. The last dimension is in format of [batch_index, w_start, h_start, w_end, h_end].

tvm.relay.vision.regular_non_max_suppression(boxes, scores, max_detections_per_class, max_detections, num_classes, iou_threshold, score_threshold)#

Regular non-maximum suppression operator for object detection, corresponding to TFLite’s regular NMS. NMS is performed for each class separately.

Parameters#

boxesrelay.Expr

3-D tensor with shape (batch_size, num_boxes, 4). The four values in boxes encode (ymin, xmin, ymax, xmax) coordinates of a box

scores: relay.Expr

3-D tensor with shape (batch_size, num_boxes, num_classes_with_background)

max_detections_per_classint

The maxinum number of output selected boxes per class

max_detectionsint

The maxinum number of output selected boxes

num_classesint

The number of classes without background

iou_thresholdfloat

IoU test threshold

score_thresholdfloat

Score threshold to filter out low score boxes early

Returns#

outrelay.Tuple

The output is a relay.Tuple of four tensors. The first is detection_boxes of size (batch_size, max_detections , 4), the second is detection_classes of size (batch_size, max_detections), the third is detection_scores of size (batch_size, max_detections), and the fourth is num_detections of size (batch_size,) representing the total number of selected boxes per batch.

tvm.relay.vision.roi_align(data, rois, pooled_size, spatial_scale, sample_ratio=-1, layout='NCHW', mode='avg')#

ROI align operator.

Parameters#

datarelay.Expr

4-D tensor with shape [batch, channel, height, width]

roisrelay.Expr

2-D tensor with shape [num_roi, 5]. The last dimension should be in format of [batch_index, w_start, h_start, w_end, h_end]

pooled_sizelist/tuple of two ints

output size

spatial_scalefloat

Ratio of input feature map height (or w) to raw image height (or w). Equals the reciprocal of total stride in convolutional layers, which should be in range (0.0, 1.0]

sample_ratioint

Optional sampling ratio of ROI align, using adaptive size by default.

modestr, Optional

The pooling method. Relay supports two methods, ‘avg’ and ‘max’. Default is ‘avg’.

Returns#

outputrelay.Expr

4-D tensor with shape [num_roi, channel, pooled_size, pooled_size]

tvm.relay.vision.roi_pool(data, rois, pooled_size, spatial_scale, layout='NCHW')#

ROI pool operator.

Parameters#

datarelay.Expr

4-D tensor with shape [batch, channel, height, width]

roisrelay.Expr

2-D tensor with shape [num_roi, 5]. The last dimension should be in format of [batch_index, w_start, h_start, w_end, h_end]

pooled_sizelist/tuple of two ints

output size

spatial_scalefloat

Ratio of input feature map height (or w) to raw image height (or w). Equals the reciprocal of total stride in convolutional layers, which should be in range (0.0, 1.0]

Returns#

outputrelay.Expr

4-D tensor with shape [num_roi, channel, pooled_size, pooled_size]

tvm.relay.vision.yolo_reorg(data, stride)#

Yolo reorg operation used in darknet models. This layer shuffles the input tensor values based on the stride value. Along with the shuffling, it does the shape transform. If ‘(n, c, h, w)’ is the data shape and ‘s’ is stride, output shape is ‘(n, c*s*s, h/s, w/s)’.

Example:

data(1, 4, 2, 2) = [[[[ 0  1] [ 2  3]]
                    [[ 4  5] [ 6  7]]
                    [[ 8  9] [10 11]]
                    [[12 13] [14 15]]]]
stride = 2
ret(1, 16, 1, 1) = [[[[ 0]]  [[ 2]]  [[ 8]]  [[10]]
                    [[ 1]]  [[ 3]]  [[ 9]]  [[11]]
                    [[ 4]]  [[ 6]]  [[12]]  [[14]]
                    [[ 5]]  [[ 7]]  [[13]]  [[15]]]]

备注

stride=1 has no significance for reorg operation.

Parameters#

datarelay.Expr

The input data tensor.

strideint

The stride value for reorganisation.

Returns#

retrelay.Expr

The computed result.