tvm.autotvm

目录

tvm.autotvm#

The auto-tuning module of tvm

This module includes:

  • Tuning space definition API

  • Efficient auto-tuners

  • Tuning result and database support

  • Distributed measurement to scale up tuning

tvm.autotvm.apply_history_best(records)#

Apply the history best config

Parameters#

recordsNone, Records, or iterator of Records objects, where a

Records object is a path-like object, a file-like object, or an iterator of (MeasureInput, MeasureResult).

Collection of tuning records. If multiple Records objects are passed, their contents will be merged.

参数:

records (None | str | bytes | Path | TextIOBase | Iterable[Tuple[MeasureInput, MeasureResult]] | Iterable[str | bytes | Path | TextIOBase | Iterable[Tuple[MeasureInput, MeasureResult]]])

tvm.autotvm.measure#

User facing API for specifying how to measure the generated code

class tvm.autotvm.measure.MeasureInput(target, task, config)[源代码]#

Stores all the necessary inputs for a measurement.

Parameters#

targettvm.target.Target

The target device

tasktask.Task

Task function

configConfigEntity

Specific configuration.

class tvm.autotvm.measure.MeasureResult(costs, error_no, all_cost, timestamp)[源代码]#

Stores all the results of a measurement

Parameters#

costs: Array of float or Array of Exception

If no error occurs during measurement, it is an array of measured running times. If an error occurs during measurement, it is an array of the exception objections.

error_no: int

Denote error type, defined by MeasureErrorNo

all_cost: float

All cost of this measure, including rpc, compilation, test runs

timestamp: float

The absolute time stamp when we finish measurement.

tvm.autotvm.measure.measure_option(builder, runner)[源代码]#

Set options for measure. To measure a config, we will build it and run it. So we have to set options for these two steps. They have their own options on timeout, parallel, etc.

Parameters#

builder: Builder

Specify how to build programs

runner: Runner

Specify how to run programs

Examples#

# example setting for using local devices >>> measure_option = autotvm.measure_option( >>> builder=autotvm.LocalBuilder(), # use all local cpu cores for compilation >>> runner=autotvm.LocalRunner( # measure them sequentially >>> number=10, >>> timeout=5) >>> )

# example setting for using remote devices >>> measure_option = autotvm.measure_option( >>> builder=autotvm.LocalBuilder(), # use all local cpu cores for compilation >>> runner=autotvm.RPCRunner( >>> 'rasp3b', 'locahost', 9190, # device key, host and port of the rpc tracker >>> number=4, >>> timeout=4) # timeout of a run on the device. RPC request waiting time is excluded. >>>)

Note#

To make measurement results accurate, you should pick the correct value for the argument number and repeat in Runner(). Some devices need a certain minimum running time to "warm up," such as GPUs that need time to reach a performance power state. Using min_repeat_ms can dynamically adjusts number, so it is recommended. The typical value for NVIDIA GPU is 150 ms.

tvm.autotvm.measure.create_measure_batch(task, option)[源代码]#

Get a standard measure_batch function.

Parameters#

task: tvm.autotvm.task.Task

The tuning task

option: dict

The option for measuring generated code. You should use the return value of function measure_option for this argument.

Returns#

measure_batch: callable

a callback function to measure a batch of configs

class tvm.autotvm.measure.measure_methods.LocalBuilder(timeout=10, n_parallel=None, build_kwargs=None, build_func='default', do_fork=False, runtime=None)[源代码]#

Run compilation on local machine

Parameters#

timeout: float

The timeout of a compilation

n_parallel: int

The number of tasks run in parallel. "None" will use all cpu cores

build_kwargs: dict

If supplied, additional kwargs passed to build_func. Overrides any build_kwargs supplied by the Runner.

build_func: callable or str

If is 'default', use default build function If is 'ndk', use function for android ndk If id 'stackvm', use function for stackvm If is callable, use it as custom build function, expect lib_format field.

do_fork: bool

If False, do not fork when building. Requires n_parallel=1.

runtime: Optional[Runtime]

Specify the runtime to generate artifacts for

class tvm.autotvm.measure.measure_methods.RPCRunner(key, host, port, priority=1, timeout=10, n_parallel=None, number=4, repeat=3, min_repeat_ms=0, cooldown_interval=0.1, enable_cpu_cache_flush=False, module_loader=None)[源代码]#

Run generated code on remove devices. This function will ask a RPC Tracker to get device for measurement.

Parameters#

timeout: float

The timeout of a RPCRunner measurement task

n_parallel: int

The number of tasks run in parallel. "None" will use all cpu cores

key: str

The key of the device registered in the tracker

host: str

The host address of RPC Tracker

port: int

The port of RPC Tracker

number: int

The number of times to run the generated code for taking average. We call these runs as one repeat of measurement.

repeatint, optional

The number of times to repeat the measurement. In total, the generated code will be run (1 + number x repeat) times, where the first "1" is warm up and will be discarded. The returned result contains repeat costs, each of which is an average of number costs.

min_repeat_ms: int, optional

The minimum duration of one repeat in milliseconds. By default, one repeat contains number runs. If this parameter is set, the parameters number will be dynamically adjusted to meet the minimum duration requirement of one repeat. i.e., When the run time of one repeat falls below this time, the number parameter will be automatically increased.

cooldown_interval: float, optional

The cool down interval between two measurements.

enable_cpu_cache_flush: bool

Whether to flush cache on CPU between repeated measurements. Flushing cache can make the measured latency of one operator closer to its actual latency during end-to-end inference. To make this option effective, the argument number should also be set to 1. This is only has effect on CPU task.

module_loaderModuleLoader

If given, a context manager that loads the module to be timed into the remote runtime. If not given, default_module_loader is used.

class tvm.autotvm.measure.measure_methods.LocalRunner(timeout=10, number=4, repeat=3, min_repeat_ms=0, cooldown_interval=0.1, enable_cpu_cache_flush=False, module_loader=None)[源代码]#

Run generated code on local devices.

Parameters#

timeout: float

The timeout of a compilation

number: int

The number of times to run the generated code for taking average. We call these runs as one repeat of measurement.

repeatint, optional

The number of times to repeat the measurement. In total, the generated code will be run (1 + number x repeat) times, where the first one is warm up and will be discarded. The returned result contains repeat costs, each of which is an average of number costs.

min_repeat_ms: int, optional

The minimum duration of one repeat in milliseconds. By default, one repeat contains number runs. If this parameter is set, the parameters number will be dynamically adjusted to meet the minimum duration requirement of one repeat. i.e., When the run time of one repeat falls below this time, the number parameter will be automatically increased.

cooldown_interval: float, optional

The cool down interval between two measurements.

enable_cpu_cache_flush: bool

Whether to flush cache on CPU between repeated measurements. Flushing cache can make the measured latency of one operator closer to its actual latency during end-to-end inference. To make this option effective, the argument number should also be set to 1. This is only has effect on CPU task.

Note#

This is a "fake" local mode. We start a silent rpc tracker and rpc server for the user. In this way we reuse timeout/isolation mechanism in RPC infrastructure.

tvm.autotvm.tuner#

A tuner takes a task as input. It proposes some promising ConfigEntity in the ConfigSpace and measure them on the real hardware. Then it proposed the next batch of ConfigEntity according to the measure results. This tuning loop is repeated.

class tvm.autotvm.tuner.Tuner(task, **kwargs)[源代码]#

Base class for tuners

Parameters#

task: autotvm.task.Task

Tuning Task

has_next()[源代码]#

Whether has next untried config in the space

Returns#

has_next: bool

load_history(data_set, min_seed_records=500)[源代码]#

load history data for transfer learning

Parameters#

data_set: Array of (autotvm.measure.MeasureInput, autotvm.measure.MeasureResult) pair

Previous tuning records

min_seed_records: int

Defaults to 500. Indicates the minimum number of records to train the tuner with. If there are less than min_seed_records number of records in data_set, no training of the tuner will be done.

next_batch(batch_size)[源代码]#

get the next batch of configs to be measure on real hardware

Parameters#

batch_size: int

The size of the batch

Returns#

a batch of configs

reset()[源代码]#

reset the status of tuner

set_error_threshold(threshold)[源代码]#

Modify error counter threshold, which controls switch to debug mode

Parameters#

threshold: New threshold value

tune(n_trial, measure_option, early_stopping=None, callbacks=(), si_prefix='G')[源代码]#

Begin tuning

Parameters#

n_trial: int

Maximum number of configs to try (measure on real hardware)

measure_option: dict

The options for how to measure generated code. You should use the return value ot autotvm.measure_option for this argument.

early_stopping: int, optional

Early stop the tuning when not finding better configs in this number of trials

callbacks: List of callable

A list of callback functions. The signature of callback function is (Tuner, List of MeasureInput, List of MeasureResult) with no return value. These callback functions will be called on every measurement pair. See autotvm/tuner/callback.py for some examples.

si_prefix: str

One of tvm.autotvm.utils.SI_PREFIXES. The SI prefix to use when reporting FLOPS.

update(inputs, results)[源代码]#

Update parameters of the tuner according to measurement results

Parameters#

inputs: Array of autotvm.measure.MeasureInput

The input for measurement

results: Array of autotvm.measure.MeasureResult

result for measurement

class tvm.autotvm.tuner.RandomTuner(task, range_idx=None)[源代码]#

Enumerate the search space in a random order

Parameters#

task: autotvm.task.Task

Tuning Task

range_idx: Optional[Tuple[int, int]]

A tuple of index range to random

has_next()#

Whether has next untried config in the space

Returns#

has_next: bool

load_history(data_set, min_seed_records=500)#

load history data for transfer learning

Parameters#

data_set: Array of (autotvm.measure.MeasureInput, autotvm.measure.MeasureResult) pair

Previous tuning records

min_seed_records: int

Defaults to 500. Indicates the minimum number of records to train the tuner with. If there are less than min_seed_records number of records in data_set, no training of the tuner will be done.

next_batch(batch_size)[源代码]#

get the next batch of configs to be measure on real hardware

Parameters#

batch_size: int

The size of the batch

Returns#

a batch of configs

reset()#

reset the status of tuner

set_error_threshold(threshold)#

Modify error counter threshold, which controls switch to debug mode

Parameters#

threshold: New threshold value

tune(n_trial, measure_option, early_stopping=None, callbacks=(), si_prefix='G')#

Begin tuning

Parameters#

n_trial: int

Maximum number of configs to try (measure on real hardware)

measure_option: dict

The options for how to measure generated code. You should use the return value ot autotvm.measure_option for this argument.

early_stopping: int, optional

Early stop the tuning when not finding better configs in this number of trials

callbacks: List of callable

A list of callback functions. The signature of callback function is (Tuner, List of MeasureInput, List of MeasureResult) with no return value. These callback functions will be called on every measurement pair. See autotvm/tuner/callback.py for some examples.

si_prefix: str

One of tvm.autotvm.utils.SI_PREFIXES. The SI prefix to use when reporting FLOPS.

update(inputs, results)#

Update parameters of the tuner according to measurement results

Parameters#

inputs: Array of autotvm.measure.MeasureInput

The input for measurement

results: Array of autotvm.measure.MeasureResult

result for measurement

class tvm.autotvm.tuner.GridSearchTuner(task, range_idx=None)[源代码]#

Enumerate the search space in a grid search order

has_next()#

Whether has next untried config in the space

Returns#

has_next: bool

load_history(data_set, min_seed_records=500)#

load history data for transfer learning

Parameters#

data_set: Array of (autotvm.measure.MeasureInput, autotvm.measure.MeasureResult) pair

Previous tuning records

min_seed_records: int

Defaults to 500. Indicates the minimum number of records to train the tuner with. If there are less than min_seed_records number of records in data_set, no training of the tuner will be done.

next_batch(batch_size)[源代码]#

get the next batch of configs to be measure on real hardware

Parameters#

batch_size: int

The size of the batch

Returns#

a batch of configs

reset()#

reset the status of tuner

set_error_threshold(threshold)#

Modify error counter threshold, which controls switch to debug mode

Parameters#

threshold: New threshold value

tune(n_trial, measure_option, early_stopping=None, callbacks=(), si_prefix='G')#

Begin tuning

Parameters#

n_trial: int

Maximum number of configs to try (measure on real hardware)

measure_option: dict

The options for how to measure generated code. You should use the return value ot autotvm.measure_option for this argument.

early_stopping: int, optional

Early stop the tuning when not finding better configs in this number of trials

callbacks: List of callable

A list of callback functions. The signature of callback function is (Tuner, List of MeasureInput, List of MeasureResult) with no return value. These callback functions will be called on every measurement pair. See autotvm/tuner/callback.py for some examples.

si_prefix: str

One of tvm.autotvm.utils.SI_PREFIXES. The SI prefix to use when reporting FLOPS.

update(inputs, results)#

Update parameters of the tuner according to measurement results

Parameters#

inputs: Array of autotvm.measure.MeasureInput

The input for measurement

results: Array of autotvm.measure.MeasureResult

result for measurement

class tvm.autotvm.tuner.GATuner(task, pop_size=100, elite_num=3, mutation_prob=0.1)[源代码]#

Tuner with genetic algorithm. This tuner does not have a cost model so it always run measurement on real machines. This tuner expands the ConfigEntity as gene.

Parameters#

pop_size: int

number of genes in one generation

elite_num: int

number of elite to keep

mutation_prob: float

probability of mutation of a knob in a gene

has_next()[源代码]#

Whether has next untried config in the space

Returns#

has_next: bool

load_history(data_set, min_seed_records=500)[源代码]#

load history data for transfer learning

Parameters#

data_set: Array of (autotvm.measure.MeasureInput, autotvm.measure.MeasureResult) pair

Previous tuning records

min_seed_records: int

Defaults to 500. Indicates the minimum number of records to train the tuner with. If there are less than min_seed_records number of records in data_set, no training of the tuner will be done.

next_batch(batch_size)[源代码]#

get the next batch of configs to be measure on real hardware

Parameters#

batch_size: int

The size of the batch

Returns#

a batch of configs

reset()#

reset the status of tuner

set_error_threshold(threshold)#

Modify error counter threshold, which controls switch to debug mode

Parameters#

threshold: New threshold value

tune(n_trial, measure_option, early_stopping=None, callbacks=(), si_prefix='G')#

Begin tuning

Parameters#

n_trial: int

Maximum number of configs to try (measure on real hardware)

measure_option: dict

The options for how to measure generated code. You should use the return value ot autotvm.measure_option for this argument.

early_stopping: int, optional

Early stop the tuning when not finding better configs in this number of trials

callbacks: List of callable

A list of callback functions. The signature of callback function is (Tuner, List of MeasureInput, List of MeasureResult) with no return value. These callback functions will be called on every measurement pair. See autotvm/tuner/callback.py for some examples.

si_prefix: str

One of tvm.autotvm.utils.SI_PREFIXES. The SI prefix to use when reporting FLOPS.

update(inputs, results)[源代码]#

Update parameters of the tuner according to measurement results

Parameters#

inputs: Array of autotvm.measure.MeasureInput

The input for measurement

results: Array of autotvm.measure.MeasureResult

result for measurement

class tvm.autotvm.tuner.XGBTuner(task, plan_size=64, feature_type='itervar', loss_type='reg', num_threads=None, optimizer='sa', diversity_filter_ratio=None, log_interval=50)[源代码]#

Tuner that uses xgboost as cost model

Parameters#

task: Task

The tuning task

plan_size: int

The size of a plan. After plan_size trials, the tuner will refit a new cost model and do planing for the next plan_size trials.

feature_type: str, optional

If is 'itervar', use features extracted from IterVar (loop variable). If is 'knob', use flatten ConfigEntity directly. If is 'curve', use sampled curve feature (relation feature).

Note on choosing feature type: For single task tuning, 'itervar' and 'knob' are good. 'itervar' is more accurate but 'knob' is much faster. There are some constraints on 'itervar', if you meet problems with feature extraction when using 'itervar', you can switch to 'knob'.

For cross-shape tuning (e.g. many convolutions with different shapes), 'itervar' and 'curve' has better transferability, 'knob' is faster.

For cross-device or cross-operator tuning, you can use 'curve' only.

loss_type: str

If is 'reg', use regression loss to train cost model. The cost model predicts the normalized flops. If is 'rank', use pairwise rank loss to train cost model. The cost model predicts relative rank score. If is 'rank-binary', use pairwise rank loss with binarized labels to train cost model. The cost model predicts relative rank score.

num_threads: int, optional

The number of threads.

optimizer: str or ModelOptimizer, optional

If is 'sa', use a default simulated annealing optimizer. Otherwise it should be a ModelOptimizer object.

diversity_filter_ratio: int or float, optional

If is not None, the tuner will first select top-(plan_size * diversity_filter_ratio) candidates according to the cost model and then pick batch_size of them according to the diversity metric.

log_interval: int = 50

The verbose level. If is 0, output nothing. Otherwise, output debug information every verbose iterations.

has_next()#

Whether has next untried config in the space

Returns#

has_next: bool

load_history(data_set, min_seed_records=500)#

load history data for transfer learning

Parameters#

data_set: Array of (autotvm.measure.MeasureInput, autotvm.measure.MeasureResult) pair

Previous tuning records

min_seed_records: int

Defaults to 500. Indicates the minimum number of records to train the tuner with. If there are less than min_seed_records number of records in data_set, no training of the tuner will be done.

next_batch(batch_size)#

get the next batch of configs to be measure on real hardware

Parameters#

batch_size: int

The size of the batch

Returns#

a batch of configs

reset()#

reset the status of tuner

set_error_threshold(threshold)#

Modify error counter threshold, which controls switch to debug mode

Parameters#

threshold: New threshold value

tune(*args, **kwargs)[源代码]#

Begin tuning

Parameters#

n_trial: int

Maximum number of configs to try (measure on real hardware)

measure_option: dict

The options for how to measure generated code. You should use the return value ot autotvm.measure_option for this argument.

early_stopping: int, optional

Early stop the tuning when not finding better configs in this number of trials

callbacks: List of callable

A list of callback functions. The signature of callback function is (Tuner, List of MeasureInput, List of MeasureResult) with no return value. These callback functions will be called on every measurement pair. See autotvm/tuner/callback.py for some examples.

si_prefix: str

One of tvm.autotvm.utils.SI_PREFIXES. The SI prefix to use when reporting FLOPS.

update(inputs, results)#

Update parameters of the tuner according to measurement results

Parameters#

inputs: Array of autotvm.measure.MeasureInput

The input for measurement

results: Array of autotvm.measure.MeasureResult

result for measurement

Namespace of callback utilities of AutoTVM

class tvm.autotvm.tuner.callback.Monitor[源代码]#

A monitor to collect statistic during tuning

trial_scores()[源代码]#

get scores (currently is flops) of all trials

trial_timestamps()[源代码]#

get wall clock time stamp of all trials

tvm.autotvm.tuner.callback.log_to_database(db)[源代码]#

Save the tuning records to a database object.

Parameters#

db: Database

The database

tvm.autotvm.tuner.callback.log_to_file(file_out, protocol='json')[源代码]#

Log the tuning records into file. The rows of the log are stored in the format of autotvm.record.encode.

Parameters#

file_outFile or str

The file to log to.

protocol: str, optional

The log protocol. Can be 'json' or 'pickle'

Returns#

callbackcallable

Callback function to do the logging.

tvm.autotvm.tuner.callback.progress_bar(total, prefix='', si_prefix='G')[源代码]#

Display progress bar for tuning

Parameters#

total: int

The total number of trials

prefix: str

The prefix of output message

si_prefix: str

SI prefix for flops

tvm.autotvm.task#

Task is a tunable composition of template functions.

Tuner takes a tunable task and optimizes the joint configuration space of all the template functions in the task. This module defines the task data structure, as well as a collection(zoo) of typical tasks of interest.

Definition of task function.

Task can be constructed from tuple of func, args, and kwargs. func is a state-less function, or a string that registers the standard task.

exception tvm.autotvm.task.task.FlopCalculationError[源代码]#

Error happens when estimating FLOP for a compute op

class tvm.autotvm.task.task.MissingTask(taskname)[源代码]#

Dummy task template for a task lookup which cannot be resolved. This can occur if the task being requested from _lookup_task() has not been imported in this run.

参数:

taskname (str)

class tvm.autotvm.task.task.Task(name, args)[源代码]#

A Tunable Task

Parameters#

name: str

The name of the task.

args: Tuple

Positional argument of func

instantiate(config)[源代码]#

Instantiate this task function (template) with a config. Returns corresponding schedule.

Parameters#

config: template.ConfigEntity

parameter config for this template

Returns#

sch: tvm.te.schedule.Schedule

The tvm schedule

arg_bufs: Array of te.tensor.Tensor

The input/output buffers

class tvm.autotvm.task.task.TaskTemplate[源代码]#

Task template is used to creates a tunable AutoTVM task.

It can be defined by a pair of compute and schedule function using _register_task_compute and _register_task_schedule, or by a customized task creation function that is more flexible using _register_customized_task.

Note that when customized func is registered, compute and schedule function will be ignored

tvm.autotvm.task.task._register_customized_task(name, func=None)[源代码]#

Register a customized function to AutoTVM task.

Parameters#

name: str

The task name

func: None or callable

If it is None, return a decorator. If is callable, decorate this function.

Returns#

decorator: callable

A decorator

tvm.autotvm.task.task._register_task_compute(name, func=None)[源代码]#

Register compute function to autotvm task

Parameters#

name: str

The task name

func: None or callable

If it is None, return a decorator. If is callable, decorate this function.

Returns#

decorator: callable

A decorator

tvm.autotvm.task.task._register_task_schedule(name, func=None)[源代码]#

Register schedule function to autotvm task

Parameters#

name: str

The task name

func: None or callable

If it is None, return a decorator. If is callable, decorate this function.

Returns#

decorator: callable

A decorator

tvm.autotvm.task.task.args_to_workload(args, task_name=None)[源代码]#

Convert argument list to hashable workload tuple. This function will convert list to tuple, tvm node to python value and flatten te.tensor.Tensor to a tuple

Parameters#

task_namestr

The AutoTVM task name

argslist of args

The arguments to the function

Returns#

ret: hashable

The hashable value

tvm.autotvm.task.task.compute_flop(sch)[源代码]#

Calculate number of FLOP (floating number operations) of the compute ops in a schedule

Parameters#

sch: tvm.te.schedule.Schedule

schedule

Returns#

flop: int

number of FLOP in this schedule

tvm.autotvm.task.task.create(task_name, args, target, target_host=None)[源代码]#

Create a tuning task and initialize its search space

Parameters#

task_namestr

The AutoTVM task name

argsList

Positional arguments

targetTarget

The compilation target

target_host: Target, optional

The compilation target for host side

Returns#

tsk: Task

a task object

tvm.autotvm.task.task.deserialize_args(args)[源代码]#

The inverse function of serialize_args.

Parameters#

args: list of hashable or Tensor

tvm.autotvm.task.task.get_config()[源代码]#

Get current config object

Returns#

cfg: ConfigSpace or ConfigEntity

The current config

tvm.autotvm.task.task.serialize_args(args)[源代码]#

serialize arguments of a topi function to a hashable tuple.

Parameters#

args: list of hashable or Tensor

tvm.autotvm.task.task.template(task_name, func=None)[源代码]#

Decorate a function as a tunable schedule template.

Parameters#

task_name: str

The task name

func: None or callable

A callable template function. If it is None, return a decorator. If is callable, decorate this function.

Returns#

func: callable

The decorated function

Examples#

The following code is a tunable template for a blocked matrix multiplication

@autotvm.template("matmul")
def matmul(N, L, M, dtype):
    A = te.placeholder((N, L), name='A', dtype=dtype)
    B = te.placeholder((L, M), name='B', dtype=dtype)

    k = te.reduce_axis((0, L), name='k')
    C = te.compute((N, M), lambda i, j: te.sum(A[i, k] * B[k, j], axis=k), name='C')
    s = te.create_schedule(C.op)

    # schedule
    y, x = s[C].op.axis
    k = s[C].op.reduce_axis[0]

    ##### define space begin #####
    cfg = autotvm.get_config()
    cfg.define_split("tile_y", y, num_outputs=2)
    cfg.define_split("tile_x", x, num_outputs=2)
    ##### define space end #####

    # schedule according to config
    yo, yi = cfg["tile_y"].apply(s, C, y)
    xo, xi = cfg["tile_x"].apply(s, C, x)

    s[C].reorder(yo, xo, k, yi, xi)

    return s, [A, B, C]

Template configuration space.

Each template function can be parameterized by a ConfigSpace. The space is declared when we invoke the template function with ConfigSpace. During evaluation, we pass in a ConfigEntity, which contains a specific entity in the space. This entity contains deterministic parameters.

exception tvm.autotvm.task.space.InstantiationError[源代码]#

Actively detected error in instantiating a template with a config, raised by cfg.raise_error e.g. too many unrolling, too many threads in a block

class tvm.autotvm.task.space.AnnotateEntity(anns)[源代码]#

An annotation operation with detailed parameters that can apply to axes

Parameters#

anns: Array of string

The annotations of axes

apply(sch, op, axes, axis_lens=None, max_unroll=None, vec_size=None, cfg=None, source=None)[源代码]#

Apply annotation to an array of axes

Parameters#

sch: tvm.te.schedule.Schedule

The tvm schedule

op: tvm.te.Operation

The stage to be applied

axes: Array of tvm.te.schedule.IterVar

axis to split

axis_lens: Array of int, optional

the length of axes

max_unroll: int, optional

maximum unroll step

vec_size: Array of int, optional

valid vector lanes for vectorization

cfg: ConfigEntity, optional

cfg for recording error

source: Array of Array tensor, optional

source tensor for attaching cache

Returns#

axeslist of tvm.te.schedule.IterVar

The transformed axes

class tvm.autotvm.task.space.AnnotateSpace(axes, policy, **kwargs)[源代码]#

The parameter space for annotating an array of axes

_generate_space(now, tmp_stack)[源代码]#

Generate space by DFS

static get_num_output(axes, policy, **kwargs)[源代码]#

get number of output axes after this transform

Returns#

n: int

number of output axes

class tvm.autotvm.task.space.Axis(space, index)#
index#

Alias for field number 1

space#

Alias for field number 0

class tvm.autotvm.task.space.ConfigEntity(index, code_hash, entity_map, constraints)[源代码]#

A configuration with detailed parameters

Parameters#

index: int

index of this config in space

code_hash: str

hash of schedule code

entity_map: dict

map name to transform entity

constraintslist

List of constraints

static from_json_dict(json_dict)[源代码]#

Build a ConfigEntity from json serializable dictionary

Parameters#

json_dict: dict

Json serializable dictionary. This should be the return value of to_json_dict.

Returns#

config: ConfigEntity

The corresponding config object

get_flatten_feature()[源代码]#

flatten entities to a numerical one-dimensional feature vector

Returns#

fea: np.array

one dimensional float32 array

get_other_option()[源代码]#

Returns#

other_option: dict

other tunable parameters (tunable parameters defined by cfg.define_knob)

to_json_dict()[源代码]#

convert to a json serializable dictionary

Return#

json_dict: dict

a json serializable dictionary

class tvm.autotvm.task.space.ConfigSpace[源代码]#

The configuration space of a schedule. Pass it as config in template to collect transformation space and build transform graph of axes

__getitem__(name)[源代码]#
get the transform entity(knob) of this entity by name

do not use this to get a ConfigEntity of this space (should use ConfigSpace.get instead)

Parameters#

name: str

name of the transform

__len__()[源代码]#

Returns the number of valid indexes in the space

_add_new_transform(space_class, name, axes, policy, **kwargs)[源代码]#

Add a new transform space in template

add_flop(flop)[源代码]#

Add float operation statistics for this tuning task

Parameters#

flop: int or float or IntImm or FloatImm

number of float operations

static axis(var)[源代码]#

get a virtual axis (axis placeholder)

Parameters#

var: int or tvm.te.schedule.IterVar

If is int, return an axis whose length is the provided argument. If is IterVar, return an axis whose length is extracted from the IterVar's extent domain.

clear_cache()[源代码]#

Clears the cache of index validity

define_annotate(name, axes, policy, **kwargs)[源代码]#

Define a new tunable knob which annotates a list of axes

Parameters#

name: str

name to index the entity of this space

axes: Array of tvm.te.schedule.IterVar

axes to annotate

policy: str

name of policy If is 'unroll', unroll the axes. If is 'try_unroll', try to unroll the axes. If is 'try_unroll_vec', try to unroll or vectorize the axes. If is 'bind_gpu', bind the first few axes to gpu threads. If is 'locate_cache', choose n axes to attach shared/local cache.

kwargs: dict

extra arguments for policy

define_knob(name, candidate)[源代码]#

Define a tunable knob with a list of candidates

Parameters#

name: str

name key of that option

candidate: list

list of candidates

define_reorder(name, axes, policy, **kwargs)[源代码]#

Define a new tunable knob which reorders a list of axes

Parameters#

name: str

name to index the entity of this space

axes: Array of tvm.te.schedule.IterVar

axes to reorder

policy: str

name of policy If is 'identity', do an identity permutation. If is 'all', try all permutations. If is 'interval_all', try all permutations of an interval of axes. If is 'candidate', try listed candidate. If is 'interleave', interleave chains of spatial axes and chains of reduction axes.

kwargs: dict

extra arguments for policy

define_split(name, axis, policy='factors', **kwargs)[源代码]#

Define a new tunable knob which splits an axis into a list of axes

Parameters#

name: str

name to index the entity of this space

axis: tvm.te.schedule.IterVar

axis to split

policy: str

name of policy. If is 'factors', the tuner will try all divisible factors. If is 'power2', the tuner will try power-of-two factors less or equal to the length. If is 'verbose', the tuner will try all candidates in above two policies. If is 'candidate', try given candidates.

**kwargs:

extra arguments for policy

max_factor:

the maximum split factor (int).

filter:

see examples below for how to use filter (Callable[[int], bool]).

num_outputs:

the total number of axis after split (int).

no_tail:

should we only include divisible numbers as split factors (bool).

candidate:

(policy=candidate) manual candidate list (List).

Examples#

>>> # use custom candidates
>>> cfg.define_split('tile_x', x, policy='candidate', num_outputs=3,
>>>   candidate=[[1, 4, 4], [4, 1, 4]])
>>> # use a filter that only accepts the split scheme whose inner most tile is less then 4
>>> cfg.define_split('tile_y', y, policy='factors', num_outputs=3,
>>>   filter=lambda x: x.size[-1] <= 4)
get(index)[源代码]#

Get a config entity with detailed parameters from this space

Parameters#

index: int

index in the space

Returns#

config: ConfigEntity

config corresponds to the index

get_next_index(index, n=1, start=None, end=None)[源代码]#

Returns the nth valid next index or None if out of range

Parameters#

index: int

specifying at which position to start, inclusive

n: int, optional

step by using to find the next index, for the opposite direction a negative number should be used

start: list, optional

start of subrange, inclusive

end: list, optional

end of subrange, exclusive

Returns#

next: int

next index in the space

get_rand_index(start=None, end=None, to_exclude=None)[源代码]#

Returns a random valid index unlisted to exclusion

Parameters#

start: int, optional

specifying at which position to start, inclusive

end: int, optional

specifying at which position to end, exclusive

to_exclude: list, optional

determines unsuitable values

Returns#

rand: int

random index in the space

备注

Excluding all valid space indexes will lead to an infinite loop.

is_index_valid(index)[源代码]#

Checks if the index satisfies the multi_filter condition

Parameters#

index: int

index from the range of the space

Returns#

valid: bool

whether the index meets all the constraints

knob2point(knob)[源代码]#

Convert knob form (vector) to point form (single integer)

Parameters#

knob: list

knob to convert

Returns#

point: int

point of the knob representation

multi_filter(filter)[源代码]#

The filter can restrict combination of parameters in difference to the knob filter, that restricts only single parameter

Parameters#

filter: function

predicate with one argument (Callable[[int], bool])

备注

Using this filter causes additional restrictions on the use of __len__. Normally, it define the count of valid indexes and the range of space, but when multi_filter enabled, it requires to use __len__ for getting the count of valid indexes or range_length for the range of space. It is recommended to use: is_index_valid, get_next_index, get_rand_index to bypass the space

Examples#

>>> # Pre-requisites
>>> candidates = [[16, 64], [32, 32], [64, 16]]
>>> filter = lambda v: v.size[0] != 16
>>> multi_filter = lambda e: (e["tile_x"].size[0] + e["tile_y"].size[0]) <= 64
>>> # Case 1 - without filtering
>>> cfg.define_split("tile_x", x, num_outputs=2, policy="candidate", candidate=candidates)
>>> cfg.define_split("tile_y", y, num_outputs=2, policy="candidate", candidate=candidates)
>>> # [('tile_x', [16, 64]), ('tile_y', [16, 64])],None,0
>>> # [('tile_x', [32, 32]), ('tile_y', [16, 64])],None,1
>>> # [('tile_x', [64, 16]), ('tile_y', [16, 64])],None,2
>>> # [('tile_x', [16, 64]), ('tile_y', [32, 32])],None,3
>>> # [('tile_x', [32, 32]), ('tile_y', [32, 32])],None,4
>>> # [('tile_x', [64, 16]), ('tile_y', [32, 32])],None,5
>>> # [('tile_x', [16, 64]), ('tile_y', [64, 16])],None,6
>>> # [('tile_x', [32, 32]), ('tile_y', [64, 16])],None,7
>>> # [('tile_x', [64, 16]), ('tile_y', [64, 16])],None,8
>>> # Case 2 - with filter
>>> cfg.define_split("tile_x", x, num_outputs=2, policy="candidate", candidate=candidates,
>>>   filter=filter)
>>> cfg.define_split("tile_y", y, num_outputs=2, policy="candidate", candidate=candidates,
>>>   filter=filter)
>>> # [('tile_x', [32, 32]), ('tile_y', [32, 32])],None,0
>>> # [('tile_x', [64, 16]), ('tile_y', [32, 32])],None,1
>>> # [('tile_x', [32, 32]), ('tile_y', [64, 16])],None,2
>>> # [('tile_x', [64, 16]), ('tile_y', [64, 16])],None,3
>>> # Case 3 - with filter and multi_filter
>>> cfg.define_split("tile_x", x, num_outputs=2, policy="candidate", candidate=candidates,
>>>   filter=filter)
>>> cfg.define_split("tile_y", y, num_outputs=2, policy="candidate", candidate=candidates,
>>>   filter=filter)
>>> cfg.multi_filter(filter=multi_filter)
>>> # [('tile_x', [32, 32]), ('tile_y', [32, 32])],None,0
point2knob(point)[源代码]#

Convert point form (single integer) to knob (vector)

Parameters#

point: int

point to convert

Returns#

knob: list

knob representation of the point

raise_error(msg)[源代码]#

register error in config Using this to actively detect error when scheduling. Otherwise these error will occur during runtime, which will cost more time.

Parameters#

msg: str

random_walk(point)[源代码]#

random walk as local transition

Parameters#

point: int

index of the ConfigEntity

Returns#

new_point: int

new neighborhood index

static reduce_axis(var)#

get a virtual axis (axis placeholder)

Parameters#

var: int or tvm.te.schedule.IterVar

If is int, return an axis whose length is the provided argument. If is IterVar, return an axis whose length is extracted from the IterVar's extent domain.

sample_ints(m)[源代码]#

Sample m different integer numbers from [0, self.range_length) without replacement This function is an alternative of np.random.choice when self.range_length > 2 ^ 32, in which case numpy does not work.

Parameters#

m: int

The number of sampled int

Returns#

ints: an numpy array of size m

subrange_length(start, end)[源代码]#

Returns the number of valid indexes within the limited range from [start, end]

Parameters#

start: int

start of subrange, inclusive

end: int

end of subrange, exclusive

Returns#

count: int

number of valid indexes

valid()[源代码]#

Check whether the config meets all the constraints

备注

This check should be called after instantiation of task, because the ConfigEntity/ConfigSpace collects errors during instantiation

Returns#

valid: bool

whether the config meets all the constraints

property dims#

Dimensions in the space

property range_length#

Length of the index range in the space

class tvm.autotvm.task.space.FallbackConfigEntity[源代码]#

The config entity created to support fallback

__setitem__(name, entity)[源代码]#

set the entity(knob) of by name

Parameters#

name: str

name of the entity

entity: SplitEntity, ReorderEntity, AnnotateEntity, OtherOptionEntity

value of the entity

fallback_split(name, constraints)[源代码]#

Fallback a split knob

Parameters#

name: str

name of the knob

constraints: List of int

The maximum tile size for every dimension. Value -1 means no constraint.

Examples#

If you use cfg.define_split('tile_0', 128, num_outputs=3), Then cfg.fallback_split('tile_0', [-1, 8, 4]) will give you cfg['tile_0'].size = [4, 8, 4]

If you use cfg.define_split('tile_0', 49, num_outputs=3), Then cfg.fallback_split('tile_0', [-1, 8, 4]) will give you cfg['tile_0'].size = [7, 7, 1]

fallback_with_reference_log(ref_log)[源代码]#

A data driven fallback mechanism. We use tuned parameters from TopHub as reference data. For an unseen shape, we find the most similar tuned one from TopHub and mimic its parameters. Note that we are not matching by workload (e.g., input size, kernel size), but instead matching by configuration space. The idea is that if two workloads have similar configuration space, their optimal configurations are also likely to be similar.

Parameters#

ref_log: List of (autotvm.measure.MeasureInput, autotvm.measure.MeasureResult)

The reference log

class tvm.autotvm.task.space.OtherOptionEntity(val)[源代码]#

The parameter entity for general option, with a detailed value

class tvm.autotvm.task.space.OtherOptionSpace(axes, policy, **kwargs)[源代码]#

The parameter space for general option

static get_num_output(axes, policy, **kwargs)[源代码]#

get number of output axes after this transform

Returns#

n: int

number of output axes

class tvm.autotvm.task.space.ReorderEntity(perm)[源代码]#

A reorder operation with detailed parameters that can apply to axes

Parameters#

perm: Array of int

define the permutation

apply(sch, op, axes)[源代码]#

Apply reorder to an array of axes

Parameters#

sch: tvm.te.schedule.Schedule

The tvm schedule

op: tvm.te.Operation

The stage to be applied

axis: tvm.te.schedule.IterVar

axis to split

Returns#

axeslist of Axis

The transformed axes.

class tvm.autotvm.task.space.ReorderSpace(axes, policy, **kwargs)[源代码]#

The parameter space for ordering an array of axes

_merge_chain(chains)[源代码]#

generate all combinations of merge some chains

static get_num_output(axes, policy, **kwargs)[源代码]#

get number of output axes after this transform

Returns#

n: int

number of output axes

class tvm.autotvm.task.space.SplitEntity(size)[源代码]#

A split operation with detailed parameters that can apply to an axis

Parameters#

size: Array of int

the size of every axis after split. e.g. an axis of extent 128, we split it into 3 axes, a possible size is [4, 4, 8] (4x4x8 = 128).

apply(sch, op, axis)[源代码]#

Apply split to an axis

Parameters#

sch: tvm.te.schedule.Schedule

The tvm schedule

op: tvm.te.Operation

The stage to be applied

axis: tvm.te.schedule.IterVar

axis to split

Returns#

axeslist of Axis

The transformed axes.

class tvm.autotvm.task.space.SplitSpace(axes, policy, **kwargs)[源代码]#

Split an axis for several times

_generate_space(now, tmp_stack, enforce_no_tail=False)[源代码]#

Generate space by DFS

static get_num_output(axes, policy, **kwargs)[源代码]#

get number of output axes after this transform

Returns#

n: int

number of output axes

class tvm.autotvm.task.space.TransformSpace[源代码]#

Base class for transform space TransformSpace is the node in the computation graph of axes

备注

We can regard our schedule code as a transformation graph of axes. Starting from raw axes in the definition of te.compute, we can transform these axes by some operators. The operator includes 'split', 'reorder' and 'annotate'. Each operator has some tunable parameters (e.g. the split factor). Then the tuning process is just to find good parameters of these op.

So all the combinations of the parameters of these op form our search space.

Naming convention: We call the set of all possible values as XXXSpace. (XXX can be Split, Reorder, Config ...) We call a specific entity in a space as XXXEntity.

__getitem__(index)[源代码]#

Get an entity of the space by index

Parameters#

index: int

Returns#

transform entity

static get_num_output()[源代码]#

get number of output axes after this transform

Returns#

n: int

number of output axes

class tvm.autotvm.task.space.VirtualAxis(var, name=None)[源代码]#

Axis placeholder in template

Parameters#

var: int or tvm.te.schedule.IterVar

If is int, return a virtual axis whose length is the provided argument. If is IterVar, return a virtual axis whose length is extracted from the IterVar's extent domain.

name: str

static get_num_output(var, name=None)[源代码]#

get number of output axes after this transform

Returns#

n: int

number of output axes

tvm.autotvm.task.space.get_factors(n)[源代码]#

return all factors of an integer

Parameters#

n: int

integer to factorize

Returns#

factors: list

List of all factors

tvm.autotvm.task.space.get_pow2s(n)[源代码]#

return all power-of-two numbers that are less or equal than the integer

Parameters#

n: int

integer for reference

Returns#

factors: list

List of all power-of-two numbers

Template dispatcher module.

A dispatcher is a function that can contains multiple behaviors. Its specific behavior is can be controlled by DispatchContext.

DispatchContext is used in two ways, usually via different implementation of the DispatchContext base class.

  • During search, we can use it to pass the current proposal from tuner.

  • During evaluation, we can use it to set pick the best policy.

class tvm.autotvm.task.dispatcher.ApplyConfig(config)[源代码]#

Apply a deterministic config entity for all queries.

Parameters#

configConfigSpace or ConfigEntity

The specific configuration we care about.

_query_inside(target, workload)[源代码]#

Override query

update(target, workload, cfg)[源代码]#

Override update

class tvm.autotvm.task.dispatcher.ApplyFixedConfig(tasks, schedule_names)[源代码]#

Apply a config of a deterministic schedule. This is used for building a single Relay operator with deterministic schedule for testing schedules at Relay level.

Parameters#

taskslist[tvm.autotvm.task.task.Task]

List of autoTVM tasks.

schedule_namesstr, List[str]

Name of schedules to use.

_query_inside(target, workload)[源代码]#

Override query

update(target, workload, cfg)[源代码]#

Override update

参数:

schedule_names (str | List[str])

class tvm.autotvm.task.dispatcher.ApplyGraphBest(records)[源代码]#

Load the graph level tuning optimal schedules.

The input records should be in the ascending order of node index for target operator. Usually this can be obtained with graph tuner.

This context maintains an internal counter to indicate the current node index.

参数:

records (str | bytes | Path | TextIOBase | Iterable[Tuple[MeasureInput, MeasureResult]])

__init__(records)[源代码]#

Parameters#

recordsstr or iterator of (autotvm.measure.MeasureInput, autotvm.measure.MeasureResult)

Collection of tuning records. If is str, then it should be the filename of a records log file.

Each row of this file is an encoded record pair.

Otherwise, it is an iterator.

参数:

records (str | bytes | Path | TextIOBase | Iterable[Tuple[MeasureInput, MeasureResult]])

_query_inside(target, workload)[源代码]#

Query the context to get config from records.

Parameters#

targetTarget

The current target

workloadWorkload

The current workload.

Returns#

cfgConfigSpace

The specific configuration.

update(target, workload, cfg)[源代码]#

Update context with a specific config.

Parameters#

target: Target

The current target

workloadWorkload

The current workload.

cfgConfigSpace

The specific configuration.

Note#

This interface is for cases when TVM decides to replace an operator in the graph. For example, AlterOpLayout pass (enables when opt_level = 3) replaces NCHW convolution with NCHW[x]c implementation on x86 CPUs. Thus in TOPI, we first query schedule using original NCHW workload, then update the dispatcher with the new NCHW[x]c workload. So that later on, NCHW[x]c convolution can get schedule from the dispatcher using its own workload directly.

@conv2d_alter_layout.register("cpu")
def _alter_conv2d_layout(attrs, inputs, tinfo):
    workload = get_conv2d_workload(...)
    dispatch_ctx = autotvm.task.DispatchContext.current
    target = tvm.target.Target.current()
    config = dispatch_ctx.query(target, workload)

    # Get conv2d_NCHWc workload from config
    # new_workload = ...
    # new_inputs = ...
    # new_attrs = ...

    # Store altered operator's config
    dispatch_ctx.update(target, new_workload, config)
    return sym.contrib.conv2d_NCHWc(*new_inputs, **new_attrs)

We directly store config back because conv2d_NCHW and conv2d_NCHWc share the same schedule parameters. One can construct a new ConfigEntity if this is not the case.

class tvm.autotvm.task.dispatcher.ApplyHistoryBest(records)[源代码]#

Apply the history best config

Parameters#

recordsNone, Records, or iterator of Records objects, where a

Records object is a path-like object, a file-like object, or an iterator of (MeasureInput, MeasureResult).

Collection of tuning records. If multiple Records objects are passed, their contents will be merged.

load(records)[源代码]#

Load records to this dispatch context

Parameters#

records : str, list of str, or iterator of (autotvm.measure.MeasureInput, autotvm.measure.MeasureResult)

Collection of tuning records. If multiple Records objects are passed, their contents will be merged.

参数:

records (str | bytes | Path | TextIOBase | Iterable[Tuple[MeasureInput, MeasureResult]] | Iterable[str | bytes | Path | TextIOBase | Iterable[Tuple[MeasureInput, MeasureResult]]])

update(target, workload, cfg)[源代码]#

Update context with a specific config.

Parameters#

target: Target

The current target

workloadWorkload

The current workload.

cfgConfigSpace

The specific configuration.

Note#

This interface is for cases when TVM decides to replace an operator in the graph. For example, AlterOpLayout pass (enables when opt_level = 3) replaces NCHW convolution with NCHW[x]c implementation on x86 CPUs. Thus in TOPI, we first query schedule using original NCHW workload, then update the dispatcher with the new NCHW[x]c workload. So that later on, NCHW[x]c convolution can get schedule from the dispatcher using its own workload directly.

@conv2d_alter_layout.register("cpu")
def _alter_conv2d_layout(attrs, inputs, tinfo):
    workload = get_conv2d_workload(...)
    dispatch_ctx = autotvm.task.DispatchContext.current
    target = tvm.target.Target.current()
    config = dispatch_ctx.query(target, workload)

    # Get conv2d_NCHWc workload from config
    # new_workload = ...
    # new_inputs = ...
    # new_attrs = ...

    # Store altered operator's config
    dispatch_ctx.update(target, new_workload, config)
    return sym.contrib.conv2d_NCHWc(*new_inputs, **new_attrs)

We directly store config back because conv2d_NCHW and conv2d_NCHWc share the same schedule parameters. One can construct a new ConfigEntity if this is not the case.

参数:

records (None | str | bytes | Path | TextIOBase | Iterable[Tuple[MeasureInput, MeasureResult]] | Iterable[str | bytes | Path | TextIOBase | Iterable[Tuple[MeasureInput, MeasureResult]]])

class tvm.autotvm.task.dispatcher.DispatchContext[源代码]#

Base class of dispatch context.

DispatchContext enables the target and workload specific dispatch mechanism for templates.

_query_inside(target, workload)[源代码]#

Query the context to get the specific config for a template. This function only query config inside this context.

Parameters#

target: Target

The current target

workloadWorkload

The current workload.

Returns#

cfgConfigSpace

The specific configuration.

query(target, workload)[源代码]#

Query the context to get the specific config for a template. If cannot find the result inside this context, this function will query it from the upper contexts.

Parameters#

target: Target

The current target

workloadWorkload

The current workload.

Returns#

cfgConfigSpace

The specific configuration.

update(target, workload, cfg)[源代码]#

Update context with a specific config.

Parameters#

target: Target

The current target

workloadWorkload

The current workload.

cfgConfigSpace

The specific configuration.

Note#

This interface is for cases when TVM decides to replace an operator in the graph. For example, AlterOpLayout pass (enables when opt_level = 3) replaces NCHW convolution with NCHW[x]c implementation on x86 CPUs. Thus in TOPI, we first query schedule using original NCHW workload, then update the dispatcher with the new NCHW[x]c workload. So that later on, NCHW[x]c convolution can get schedule from the dispatcher using its own workload directly.

@conv2d_alter_layout.register("cpu")
def _alter_conv2d_layout(attrs, inputs, tinfo):
    workload = get_conv2d_workload(...)
    dispatch_ctx = autotvm.task.DispatchContext.current
    target = tvm.target.Target.current()
    config = dispatch_ctx.query(target, workload)

    # Get conv2d_NCHWc workload from config
    # new_workload = ...
    # new_inputs = ...
    # new_attrs = ...

    # Store altered operator's config
    dispatch_ctx.update(target, new_workload, config)
    return sym.contrib.conv2d_NCHWc(*new_inputs, **new_attrs)

We directly store config back because conv2d_NCHW and conv2d_NCHWc share the same schedule parameters. One can construct a new ConfigEntity if this is not the case.

class tvm.autotvm.task.dispatcher.FallbackContext[源代码]#

A fallback dispatch context.

Any tunable template can be called under this context. This is the root context.

clear_cache(target, workload)[源代码]#

Clear fallback cache. Pass the same argument as _query_inside to this function to clean the cache.

Parameters#

target: Target

The current target

workloadWorkload

The current workload.

update(target, workload, cfg)[源代码]#

Update context with a specific config.

Parameters#

target: Target

The current target

workloadWorkload

The current workload.

cfgConfigSpace

The specific configuration.

Note#

This interface is for cases when TVM decides to replace an operator in the graph. For example, AlterOpLayout pass (enables when opt_level = 3) replaces NCHW convolution with NCHW[x]c implementation on x86 CPUs. Thus in TOPI, we first query schedule using original NCHW workload, then update the dispatcher with the new NCHW[x]c workload. So that later on, NCHW[x]c convolution can get schedule from the dispatcher using its own workload directly.

@conv2d_alter_layout.register("cpu")
def _alter_conv2d_layout(attrs, inputs, tinfo):
    workload = get_conv2d_workload(...)
    dispatch_ctx = autotvm.task.DispatchContext.current
    target = tvm.target.Target.current()
    config = dispatch_ctx.query(target, workload)

    # Get conv2d_NCHWc workload from config
    # new_workload = ...
    # new_inputs = ...
    # new_attrs = ...

    # Store altered operator's config
    dispatch_ctx.update(target, new_workload, config)
    return sym.contrib.conv2d_NCHWc(*new_inputs, **new_attrs)

We directly store config back because conv2d_NCHW and conv2d_NCHWc share the same schedule parameters. One can construct a new ConfigEntity if this is not the case.

tvm.autotvm.task.dispatcher.clear_fallback_cache(target, workload)[源代码]#

Clear fallback cache. Pass the same argument as _query_inside to this function to clean the cache.

Parameters#

target: Target

The current target

workloadWorkload

The current workload.

Note#

This is used in alter_op_layout to clear the bad cache created before call topi compute function

Decorators for registering tunable templates to TOPI.

These decorators can make your simple implementation be able to use different configurations for different workloads. Here we directly use all arguments to the TOPI call as "workload", so make sure all the arguments (except tvm.te.Tensor) in you calls are hashable. For tvm.te.Tensor, we will serialize it to a hashable tuple.

See tvm/topi/python/topi/arm_cpu/depthwise_conv2d.py for example usage.

class tvm.autotvm.task.topi_integration.TaskExtractEnv(allow_duplicate=False)[源代码]#

Global environment for extracting tuning tasks from graph

add_task(task_name, args)[源代码]#

Add AutoTVM task

Parameters#

task_name: str

AutoTVM task name.

args: tuple

Arguments to the TOPI function.

static get(allow_duplicate=False)[源代码]#

Get the single instance of TaskExtractEnv

Parameters#

allow_duplicateboolean

Whether to fetch all workloads in the network, even though some of them are the same. This is useful for graph tuning.

Returns#

env: TaskExtractEnv

The single instance of TaskExtractEnv

get_tasks()[源代码]#

Get collected tasks

Returns#

tasks: List of tuple(name, args)

A list of tasks extracted from the graph

reset(wanted_relay_ops=None)[源代码]#

Reset task collections

Parameters#

wanted_relay_ops: List of tvm.ir.Op

The relay ops to be extracted

tvm.autotvm.task.topi_integration.get_workload(outs, task_name=None)[源代码]#

Retrieve the workload from outputs

tvm.autotvm.task.topi_integration.register_topi_compute(task_name, func=None)[源代码]#

Register a tunable template for a topi compute function.

The registration will wrap this topi compute to take cfg as the first argument, followed by the original argument list. It uses all its argument as workload and stores this "workload" to its final ComputeOp, which can be used to reconstruct "workload" in the following topi_schedule call.

Parameters#

task_name: str

The AutoTVM task name

func: None or callable

If it is None, return a decorator. If is callable, decorate this function.

Returns#

decorator: callable

A decorator

Examples#

See tvm/topi/python/topi/arm_cpu/depthwise_conv2d.py for example usage.

tvm.autotvm.task.topi_integration.register_topi_schedule(task_name, func=None)[源代码]#

Register a tunable template for a topi schedule function.

The registration will wrap this topi schedule to take cfg as the first argument, followed by the original argument list.

Note that this function will try to find "workload" from all the ComputeOp in the input. You can attach "workload" to your compute op by using register_topi_compute.

The task name has to be the same as that of the corresponding topi compute function.

Parameters#

task_name: str

The AutoTVM task name

func: None or callable

If it is None, return a decorator. If is callable, decorate this function.

Returns#

decorator: callable

A decorator

Examples#

See tvm/topi/python/topi/arm_cpu/depthwise_conv2d.py for example usage.

tvm.autotvm.record#

Tuning record and serialization format

tvm.autotvm.record.decode(row, protocol='json')[源代码]#

Decode encoded record string to python object

Parameters#

rowstr

a row in the logger file

protocolstr

log protocol, json or pickle

Returns#

rettuple(autotvm.measure.MeasureInput, autotvm.measure.MeasureResult), or None

The tuple of input and result, or None if input uses old version log format.

tvm.autotvm.record.encode(inp, result, protocol='json')[源代码]#

encode (MeasureInput, MeasureResult) pair to a string

Parameters#

inp: autotvm.measure.MeasureInput result: autotvm.measure.MeasureResult

pair of input/result

protocol: str

log protocol, json or pickle

Returns#

row: str

a row in the logger file

tvm.autotvm.record.load_from_buffer(file)[源代码]#

Generator: load records from buffer. This is a generator that yields the records.

Parameters#

file: io.TextIOBase

Yields#

input: autotvm.measure.MeasureInput result: autotvm.measure.MeasureResult

参数:

file (TextIOBase)

tvm.autotvm.record.load_from_file(filepath)[源代码]#

Generator: load records from path. This is a generator that yields the records.

Parameters#

filepath: str, bytes, or os.PathLike

Yields#

input: autotvm.measure.MeasureInput result: autotvm.measure.MeasureResult

参数:

filepath (str | bytes | PathLike)

tvm.autotvm.record.measure_str_key(inp, include_config=True)[源代码]#

get unique str key for MeasureInput

Parameters#

inp: autotvm.measure.MeasureInput

input for the measure

include_config: bool, optional

whether includes config in the str key

Returns#

key: str

The str representation of key

tvm.autotvm.record.pick_best(in_file, out_file)[源代码]#

Pick the best entries from a file and store them to another file. This function distills the useful log entries from a large log file. If out_file already exists, the best entries from both in_file and out_file will be saved.

Parameters#

in_file: str

The filename of input

out_file: str or file

The filename of output

tvm.autotvm.record.split_workload(in_file, clean=True)[源代码]#

Split a log file into separate files, each of which contains only a single workload This function can also delete duplicated records in log file

Parameters#

in_file: str

input filename

clean: bool

whether delete duplicated items