tvm.auto_scheduler

目录

tvm.auto_scheduler#

Namespace for TVM Auto-scheduler.

Classes:

ApplyHistoryBest(records[, n_lines, ...])

Apply the history best config

ApplyHistoryBestOrSample(records[, ...])

Apply the history best config, or sample a valid schedule if no config is found.

ComputeDAG(compute_or_sche)

The auto-scheduler's computational graph and related program analyses.

DispatchContext()

Base class of dispatch context.

EmptyPolicy(task[, init_search_callbacks])

A simple example of the search policy which always returns the initial naive schedule (state).

HardwareParams([num_cores, ...])

The parameters of target hardware used to guide the search policy.

LayoutRewriteOption()

Options for applying layout rewrite.

LocalBuilder([timeout, n_parallel, build_func])

LocalBuilder use local CPU cores to build programs in parallel.

LocalRPCMeasureContext([priority, ...])

A context wrapper for running RPCRunner locally.

LocalRunner([timeout, number, repeat, ...])

LocalRunner that uses local CPU/GPU to measures the time cost of programs.

MeasureInput(task, state)

Store the input of a measurement.

MeasureResult(costs, error_no, error_msg, ...)

Store the results of a measurement.

PreloadCustomSketchRule(meet_condition_func, ...)

A SearchCallback for SketchSearchPolicy that allows users to add custom sketch rule.

PreloadMeasuredStates(filename)

A SearchCallback to load measured states from the log file for a search policy.

RPCRunner(key, host, port[, priority, ...])

RPCRunner that uses RPC call to measures the time cost of programs on remote devices.

RandomModel()

A model that returns random estimation for all inputs

RecordReader(filename)

Reader of the json log file.

RecordToFile(filename)

A measurement callback that writes measurement records into a file.

SearchTask([func, args, compute_dag, ...])

The computation information and hardware parameters for a schedule search task.

SketchPolicy(task[, program_cost_model, ...])

The search policy that searches in a hierarchical search space defined by sketches.

TaskScheduler(tasks[, task_weights, ...])

Allocate the time resources when tuning multiple tasks together.

TuningOptions([num_measure_trials, ...])

This controls the options of performance tuning.

XGBModel([verbose_eval, num_warmup_sample, ...])

Train a XGBoost model to predict the normalized throughputs of programs.

Functions:

auto_schedule(task[, search_policy, ...])

THIS API IS DEPRECATED.

create_task(func, args, target[, ...])

THIS API IS DEPRECATED.

extract_tasks(mod, params, target[, ...])

Extract tuning tasks from a relay program.

get_shape_from_rewritten_layout(...)

Get the orginal shape from a rewritten layout string.

is_auto_scheduler_enabled()

Return whether the auto-scheduler is enabled.

load_best_record(filename[, workload_key, ...])

Return the best measurement pair form a log file.

load_records(filename)

Load measurement records from a file.

make_workload_key(func, args)

Make a workload key by function and arguments.

register_task_input_check_func(func_name[, ...])

Register a function that checks the input buffer map.

register_workload(func_name[, f, override])

Register a function that generates a certain workload.

remove_index_check(tensor)

Remove the safety check in the indexing function for a tensor.

rewrite_compute_body(compute_tensor, new_layout)

Rewrite the body of a ComputeOp according to a new layout of a placeholder

rewrite_tensor_shape(tensor, shape)

Rewrite the tensor shape

save_records(filename, inputs, results)

Append measure records to file.

class tvm.auto_scheduler.ApplyHistoryBest(records, n_lines=None, include_compatible=False)[源代码]#

Apply the history best config

Parameters#

recordsstr, list of str, or iterator of (auto_scheduler.measure.MeasureInput, auto_scheduler.measure.MeasureResult)

Collection of tuning records. If is str, then it should be the filename of a records log file. Each row of this file is an encoded record pair. If it is an iterator, it can either be a set of str filenames which will be applied jointly, or a set of (input, result) tuples.

n_lines: Optional[int]

if it is not None, only load the first n_lines lines of log.

include_compatible: bool

When set to True, compatible records will also be considered.

Methods:

get_workload_entry(best_records, target_key, ...)

Get the entry of the target key and workload key hash in the given best record map.

load(records[, n_lines])

Load records to this dispatch context

update(target, workload_key, state)

Update the config for a workload

static get_workload_entry(best_records, target_key, workload_key)[源代码]#

Get the entry of the target key and workload key hash in the given best record map.

Parameters#

best_records: Dict[str, Dict[str, Dict[str, Any]]]

The best record map.

target_key: str

The first key to the best_records.

workload_key: str

The workload key that can be decoded to workload hash and args.

Returns#

entry: Dict[str, Any]

The entry in best_records with target key and workload hash.

workload_hash: str

The workload hash decoded from workload_key.

workload_args: Tuple[Any, …]

The hashable tuple of workload args decoded from workload_key.

load(records, n_lines=None)[源代码]#

Load records to this dispatch context

Parameters#

recordsstr or iterator of (auto_scheduler.measure.MeasureInput, auto_scheduler.measure.MeasureResult)

Collection of tuning records. If is str, then it should be the filename of a records log file. Each row of this file is an encoded record pair. Otherwise, it is an iterator.

n_lines: Optional[int]

if it is not None, only load the first n_lines lines of log

update(target, workload_key, state)[源代码]#

Update the config for a workload

Parameters#

target: Target

The current target

workload_keystr

The current workload_key.

stateStateObject

The state that stores schedule configuration for the workload

class tvm.auto_scheduler.ApplyHistoryBestOrSample(records, sample_simple_workloads=False, cost_model_file=None, num_measure=-1)[源代码]#

Apply the history best config, or sample a valid schedule if no config is found.

Parameters#

recordsstr or iterator of (auto_scheduler.measure.MeasureInput, auto_scheduler.measure.MeasureResult)

Collection of tuning records. If is str, then it should be the filename of a records log file. Each row of this file is an encoded record pair. Otherwise, it is an iterator.

sample_simple_workloads: bool

When False, sampling will not apply to simple workloads (w/o reduction).

cost_model_file: str

The filename of the pre-trained XGBoost cost model. If not present, then random model will be used.

num_measure: int

Meausre the top-N rank of sampled schedules on the device. The default -1 means no measurement and simply return the top-1 schedule ranked by the cost model.

Methods:

query(target, workload_key, has_complex_op, ...)

Query the context to get the specific config for a workload.

query(target, workload_key, has_complex_op, dag, func_name)[源代码]#

Query the context to get the specific config for a workload. If this function cannot find the result inside this context, it will query the result from the upper contexts.

Parameters#

target: Target

The current target

workload_keystr

The workload key

has_complex_op: bool

Whether this workload has at least one complex op.

dag: ComputeDAG

The ComputeDAG of the workload.

func_name: str

The function name of this workload.

Returns#

stateStateObject

The state that stores schedule configuration for the workload

class tvm.auto_scheduler.ComputeDAG(compute_or_sche)[源代码]#

The auto-scheduler’s computational graph and related program analyses.

We convert a compute declaration described by tvm.compute (could be a single operator or a subgraph) to a ComputeDAG. It keeps the input/output tensors, all operations in the DAG, and some static analysis results for the DAG (e.g. the total float operation count, consumer/producer relations of operations, whether an operation stage should be tiled/compute inlined). These analyses can help the search policy to make decisions during the search. ComputeDAG is also responsible for the interaction between auto-scheduler’s LoopState and TVM schedule (e.g. applying the LoopState transform steps to a TVM schedule, providing LoopState with extra information got from TVM schedule).

Parameters#

computeUnion[List[Tensor], str, tvm.te.Schedule]

Input/output tensors or workload key for a compute declaration.

Methods:

apply_steps_from_state(state[, layout_rewrite])

Apply the history transform steps from a State to get a TVM schedule.

get_init_state()

Get the init state of this ComputeDAG.

infer_bound_from_state(state)

Infer and fill the bound of all iterators of a state.

print_python_code_from_state(state)

Print transform steps in the history of a State as TVM's python schedule code.

rewrite_layout_from_state(state)

Rewrite the layout of the DAG according to the history transform steps of a state.

workload_key()

Return the workload key of this compute DAG.

apply_steps_from_state(state, layout_rewrite=0)[源代码]#

Apply the history transform steps from a State to get a TVM schedule.

Parameters#

stateUnion[State, StateObject]

The state from which we get transform steps.

layout_rewrite: LayoutRewriteOption = NoRewrite

Rewrite the layout of placeholders specified by “layout_free_placeholders” attr to make it most friendly for the generated schedule to read from.

Returns#

A te.schedule and the a list of te.Tensor to be used in tvm.lower or tvm.build.

get_init_state()[源代码]#

Get the init state of this ComputeDAG.

Returns#

stateState

The initial State without any transform steps.

infer_bound_from_state(state)[源代码]#

Infer and fill the bound of all iterators of a state.

The states may lose complete bound information after some transform steps (e.g., compute_at). We can call this function to infer and fill all the bound information. This function calls TVM InferBound pass internally to get the bound. The returned state of this function is guaranteed to have complete iterator extent information.

Parameters#

stateUnion[State, StateObject]

The state from which we get transform steps.

Returns#

updated_stateState

The State with complete bound information.

print_python_code_from_state(state)[源代码]#

Print transform steps in the history of a State as TVM’s python schedule code.

This is used to print transformation steps for debugging. Use apply_steps_from_state if you want to get a schedule for code generation.

Parameters#

stateUnion[State, StateObject]

The state from which we get transform steps.

Returns#

strStr

The Python schedule code.

rewrite_layout_from_state(state)[源代码]#

Rewrite the layout of the DAG according to the history transform steps of a state.

Parameters#

stateUnion[State, StateObject]

The state from which we get transform steps.

Returns#

updated_dagComputeDAG

The compute dag with rewritten layout.

workload_key()[源代码]#

Return the workload key of this compute DAG. The workload key is a JSON string from a tuple of (hash of DAG, tensor shapes…)

Returns#

key: str

The workload key of this compute DAG

class tvm.auto_scheduler.DispatchContext[源代码]#

Base class of dispatch context.

Methods:

_query_inside(target, workload_key, func_name)

Query the context to get the specific config for a workload.

query(target, workload_key, has_complex_op, ...)

Query the context to get the specific config for a workload.

update(target, workload_key, state)

Update the config for a workload

_query_inside(target, workload_key, func_name)[源代码]#

Query the context to get the specific config for a workload. This function only query config inside this context.

Parameters#

target: Target

The current target

workload_keystr

The current workload_key.

func_name: str

The function name of this workload.

Returns#

stateStateObject

The schedule configuration for the workload

query(target, workload_key, has_complex_op, dag, func_name)[源代码]#

Query the context to get the specific config for a workload. If this function cannot find the result inside this context, it will query the result from the upper contexts.

Parameters#

target: Target

The current target

workload_keystr

The workload key

has_complex_op: bool

Whether this workload has at least one complex op.

dag: ComputeDAG

The ComputeDAG of the workload.

func_name: str

The function name of this workload.

Returns#

stateStateObject

The state that stores schedule configuration for the workload

update(target, workload_key, state)[源代码]#

Update the config for a workload

Parameters#

target: Target

The current target

workload_keystr

The current workload_key.

stateStateObject

The state that stores schedule configuration for the workload

class tvm.auto_scheduler.EmptyPolicy(task, init_search_callbacks=None)[源代码]#

A simple example of the search policy which always returns the initial naive schedule (state).

Parameters#

taskSearchTask

The SearchTask for the computation declaration.

init_search_callbacksOptional[List[SearchCallback]]

Callback functions called before the search process.

class tvm.auto_scheduler.HardwareParams(num_cores=None, vector_unit_bytes=None, cache_line_bytes=None, max_shared_memory_per_block=None, max_local_memory_per_block=None, max_threads_per_block=None, max_vthread_extent=None, warp_size=None, target=None, target_host=None)[源代码]#

The parameters of target hardware used to guide the search policy.

When a parameter isn’t provided, it will instead use the current machine’s default value if target is specified. TODO(jcf94): This is considered to be merged with the new Target specification: https://discuss.tvm.apache.org/t/rfc-tvm-target-specification/6844 Parameters ———- num_cores : int, optional

The number of device cores.

vector_unit_bytesint, optional

The width of vector units in bytes.

cache_line_bytesint, optional

The size of cache line in bytes.

max_shared_memory_per_blockint, optional

The max shared memory per block in bytes.

max_local_memory_per_blockint, optional

The max local memory per block in bytes.

max_threads_per_blockint, optional

The max number of threads per block.

max_vthread_extentint, optional

The max vthread extent.

warp_sizeint, optional

The thread numbers of a warp.

targetstr or Target, optional

The compilation target. Used to determine default values if provided.

target_hoststr or Target, optional

The compilation target host. Used to determine default values if provided.

Methods:

__str__()

Pretty printing for hardware parameter configuration.

__str__()[源代码]#

Pretty printing for hardware parameter configuration.

class tvm.auto_scheduler.LayoutRewriteOption[源代码]#

Options for applying layout rewrite.

The NO_REWRITE and INSERT_TRANSFORM_STAGE are expected to be used when tuning a standalone op, and the REWRITE_FOR_PRE_TRANSFORMED is expected to be used when tuning ops inside a network.

Methods:

get_target_default(target[, ...])

Get the default layout rewrite option for the specified target.

static get_target_default(target, in_relay_integration=False)[源代码]#

Get the default layout rewrite option for the specified target. Currently we only enable layout rewrite for cpu / mali backend for now

Parameters#

target: tvm.target.Target

The compilation target.

in_relay_integration: bool

If this check is ask for relay integration.

Returns#

layout_rewrite_option: LayoutRewriteOption

The default layout rewrite option for the specified target.

class tvm.auto_scheduler.LocalBuilder(timeout=15, n_parallel=4, build_func='default')[源代码]#

LocalBuilder use local CPU cores to build programs in parallel.

Parameters#

timeoutint = 15

The timeout limit (in second) for each build thread. This is used in a wrapper of the multiprocessing.Process.join().

n_parallelint = multiprocessing.cpu_count()

Number of threads used to build in parallel.

build_func: callable or str = “default”

If is ‘default’, use default build function If is ‘ndk’, use function for android ndk If is callable, use it as custom build function, expect lib_format field.

class tvm.auto_scheduler.LocalRPCMeasureContext(priority=1, n_parallel=1, timeout=10, number=3, repeat=1, min_repeat_ms=0, cooldown_interval=0.0, enable_cpu_cache_flush=False, device=0)[源代码]#

A context wrapper for running RPCRunner locally. This will launch a local RPC Tracker and local RPC Server.

Parameters#

priorityint = 1

The priority of this run request, larger is more prior.

n_parallelint = 1

The number of tasks run in parallel.

timeoutint = 10

The timeout limit (in second) for each run. This is used in a wrapper of the multiprocessing.Process.join().

numberint = 3

The number of times to run the generated code for taking average. We call these runs as one repeat of measurement.

repeatint = 1

The number of times to repeat the measurement. In total, the generated code will be run (1 + number x repeat) times, where the first “1” is warm up and will be discarded. The returned result contains repeat costs, each of which is an average of number costs.

min_repeat_msint = 0

The minimum duration of one repeat in milliseconds. By default, one repeat contains number runs. If this parameter is set, the parameters number will be dynamically adjusted to meet the minimum duration requirement of one repeat. i.e., When the run time of one repeat falls below this time, the number parameter will be automatically increased.

cooldown_intervalfloat = 0.0

The cool down interval between two measurements in seconds.

enable_cpu_cache_flush: bool = False

Whether to flush cache on CPU between repeated measurements. Flushing cache can make the measured latency of one operator closer to its actual latency during end-to-end inference. To make this option effective, the argument number should also be set to 1. This is only has effect on CPU task.

device: int = 0

Which device to run on if multiple are available.

class tvm.auto_scheduler.LocalRunner(timeout=10, number=3, repeat=1, min_repeat_ms=100, cooldown_interval=0.0, enable_cpu_cache_flush=False, device=0)[源代码]#

LocalRunner that uses local CPU/GPU to measures the time cost of programs.

Parameters#

timeoutint = 10

The timeout limit (in second) for each run. This is used in a wrapper of the multiprocessing.Process.join().

numberint = 3

The number of times to run the generated code for taking average. We call these runs as one repeat of measurement.

repeatint = 1

The number of times to repeat the measurement. In total, the generated code will be run (1 + number x repeat) times, where the first “1” is warm up and will be discarded. The returned result contains repeat costs, each of which is an average of number costs.

min_repeat_msint = 100

The minimum duration of one repeat in milliseconds. By default, one repeat contains number runs. If this parameter is set, the parameters number will be dynamically adjusted to meet the minimum duration requirement of one repeat. i.e., When the run time of one repeat falls below this time, the number parameter will be automatically increased.

cooldown_intervalfloat = 0.0

The cool down interval between two measurements in seconds.

enable_cpu_cache_flush: bool = False

Whether to flush cache on CPU between repeated measurements. Flushing cache can make the measured latency of one operator closer to its actual latency during end-to-end inference. To make this option effective, the argument number should also be set to 1. This is only has effect on CPU task.

device: int = 0

Which device to run on if multiple are available.

class tvm.auto_scheduler.MeasureInput(task, state)[源代码]#

Store the input of a measurement.

Parameters#

taskSearchTask

The SearchTask of this measurement.

stateUnion[State, StateObject]

The State to be measured.

Methods:

serialize()

Custom serialization to workaround MeasureInput not exposing all its members to the TVM ffi interface.

serialize()[源代码]#

Custom serialization to workaround MeasureInput not exposing all its members to the TVM ffi interface.

Note that we do not implement __getstate__ as it does not seem to work with initialization of the workload registry (maybe because of initialization order?).

class tvm.auto_scheduler.MeasureResult(costs, error_no, error_msg, all_cost, timestamp)[源代码]#

Store the results of a measurement.

Parameters#

costsList[float]

The time costs of execution.

error_noint

The error code.

error_msgOptional[str]

The error message if there is any error.

all_costfloat

The time cost of build and run.

timestampfloat

The time stamps of this measurement.

class tvm.auto_scheduler.PreloadCustomSketchRule(meet_condition_func, apply_func, rule_name='CustomSketchRule')[源代码]#

A SearchCallback for SketchSearchPolicy that allows users to add custom sketch rule.

Notes#

This is an advanced feature. Make sure you’re clear how it works and this should only be used in SketchSearchPolicy.

Parameters#

meet_condition_func: Callable

A function with (policy, state, stage_id) -> int. Should return one of the result enumeration.

apply_func: Callable

A function with (policy, state, stage_id) -> [[State, int], …].

rule_name: str = “CustomSketchRule”

The name of this custom sketch rule.

class tvm.auto_scheduler.PreloadMeasuredStates(filename)[源代码]#

A SearchCallback to load measured states from the log file for a search policy.

This can resume the state of the search policy:
  • Making sure an already measured state in former searches will never be measured again.

  • The history states can be used to speed up the search process(e.g. SketchPolicy uses history states as starting point to perform Evolutionary Search).

Parameters#

filenamestr

The name of the record file.

class tvm.auto_scheduler.RPCRunner(key, host, port, priority=1, n_parallel=1, timeout=10, number=3, repeat=1, min_repeat_ms=100, cooldown_interval=0.0, enable_cpu_cache_flush=False, device=0)[源代码]#

RPCRunner that uses RPC call to measures the time cost of programs on remote devices. Or sometime we may need to use RPC even in local running to insulate the thread environment. (e.g. running CUDA programs)

Parameters#

keystr

The key of the device registered in the RPC tracker.

hoststr

The host address of the RPC Tracker.

portint

The port of RPC Tracker.

priorityint = 1

The priority of this run request, larger is more prior.

n_parallelint = 1

The number of tasks run in parallel.

timeoutint = 10

The timeout limit (in second) for each run. This is used in a wrapper of the multiprocessing.Process.join().

numberint = 3

The number of times to run the generated code for taking average. We call these runs as one repeat of measurement.

repeatint = 1

The number of times to repeat the measurement. In total, the generated code will be run (1 + number x repeat) times, where the first “1” is warm up and will be discarded. The returned result contains repeat costs, each of which is an average of number costs.

min_repeat_msint = 100

The minimum duration of one repeat in milliseconds. By default, one repeat contains number runs. If this parameter is set, the parameters number will be dynamically adjusted to meet the minimum duration requirement of one repeat. i.e., When the run time of one repeat falls below this time, the number parameter will be automatically increased.

cooldown_intervalfloat = 0.0

The cool down interval between two measurements in seconds.

enable_cpu_cache_flush: bool = False

Whether to flush cache on CPU between repeated measurements. Flushing cache can make the measured latency of one operator closer to its actual latency during end-to-end inference. To make this option effective, the argument number should also be set to 1. This is only has effect on CPU task.

device: int = 0

Which device to run on if multiple are available.

class tvm.auto_scheduler.RandomModel[源代码]#

A model that returns random estimation for all inputs

Methods:

predict(search_task, states)

Predict the scores of states

update(inputs, results)

Update the cost model according to new measurement results (training data).

predict(search_task, states)[源代码]#

Predict the scores of states

Parameters#

search_taskSearchTask

The search task of states

statesList[State]

The input states

Returns#

scores: List[float]

The predicted scores for all states

update(inputs, results)[源代码]#

Update the cost model according to new measurement results (training data).

Parameters#

inputsList[auto_scheduler.measure.MeasureInput]

The measurement inputs

resultsList[auto_scheduler.measure.MeasureResult]

The measurement results

class tvm.auto_scheduler.RecordReader(filename)[源代码]#

Reader of the json log file.

Parameters#

filenamestr

File name for this reader to load log from.

Methods:

check_workload_key(inputs)

Check and throw warnings for records with old format workload key.

read_lines([max_lines, skip_lines])

Read multiple lines from the log file.

check_workload_key(inputs)[源代码]#

Check and throw warnings for records with old format workload key.

Parameters#

inputs: List[MeasureInput]

The measure inputs to be checked.

Notes#

This checker could be deprecated in the future.

read_lines(max_lines=None, skip_lines=0)[源代码]#

Read multiple lines from the log file.

Parameters#

max_linesOptional[int]

The maximum number of lines. None to read all lines.

skip_linesint = 0

Skip the first n lines.

Returns#

inputsList[auto_scheduler.measure.MeasureInput]

The MeasureInputs loaded from the log file.

resultsList[auto_scheduler.measure.MeasureResult]

The MeasureResults loaded from the log file.

Notes#

Some unimportant and expensive fields in the returned MeasureInput are not deserialized for faster read speed (e.g. input.task.compute_dag, input.state.stages). If you want to use them, you can call the recover_measure_input below to rebuild these fields.

class tvm.auto_scheduler.RecordToFile(filename)[源代码]#

A measurement callback that writes measurement records into a file.

Parameters#

filenamestr

File name for this callback to write log to.

class tvm.auto_scheduler.SearchTask(func=None, args=None, compute_dag=None, workload_key=None, target=None, target_host=None, hardware_params=None, layout_rewrite_option=None, task_inputs=None, task_inputs_overwrite=False, task_inputs_save_to_file=False, desc='')[源代码]#

The computation information and hardware parameters for a schedule search task.

Parameters#

funcUnion[Function, str]

The function that returns the compute declaration Tensors. Can be the a function or the function name.

argsUnion[Tuple[Any, …], List[Any]]

The args of the function.

compute_dagComputeDAG

The ComputeDAG for the corresponding compute declaration.

workload_keystr

The workload key for the corresponding compute declaration.

targetany target-like object, see Target.canon_target

The target device of this search task.

target_hostNone or any target-like object, see Target.canon_target

The target host device of this search task.

hardware_paramsOptional[HardwareParams]

Hardware parameters used in this search task.

layout_rewrite_optionOptional[LayoutRewriteOption]

The layout rewrite option used for measuring programs. If None, the default value will be set depending on the specified target. Auto_scheduler will find a better schedule for the specified layout rewrite option. The NO_REWRITE and INSERT_TRANSFORM_STAGE are expected to be used when tuning a standalone op, and the REWRITE_FOR_PRE_TRANSFORMED is expected to be used when tuning ops inside a network.

task_inputsUnion[Dict[str, tvm.nd.NDArray], List[str]]

A dict maps the input names to input tensors or a list of input names. Some special Tensor used as inputs in program measuring. Usually we do not need to care about it, but for special workloads like Sparse computation the Sparse Tensor input are meaningful that we cannot use random input directly.

task_inputs_overwritebool = False

Whether to overwrite the data if a name has already in the global table.

task_inputs_save_to_filebool = False

Whether to save the data to a local file as well. This can be reused to resume the last tuning process.

desc: str = “”

The description string of this task.

Examples#

# We support two ways to create a search task

# Way 1: create a task by a workload generation function.
# The `workload_func` is a function decorated by @auto_scheduler.register_workload
task = SearchTask(func=workload_func, args=args, target=target)

# Way 2: create a task by a workload_key.
# The `workload_key` is a string, which can be either a hash key or a json-serialized
# tuple(func, args).
task = SearchTask(workload_key=workload_key, target=target)

Methods:

apply_best(log_file[, include_compatible, ...])

Apply the history best from a log file and return the schedule.

print_best(log_file[, print_mode])

Print the best schedule as python schedule API code or CUDA source code.

tune(tuning_options[, search_policy, ...])

Run auto scheduling search for a task

apply_best(log_file, include_compatible=False, layout_rewrite_option=None)[源代码]#

Apply the history best from a log file and return the schedule.

Parameters#

log_filestr

The name of the log file.

include_compatible: bool

When set to True, all compatible records in the log file will be considered.

layout_rewrite_optionOptional[LayoutRewriteOption]

The layout rewrite option.

Returns#

A te.Schedule and the a list of te.Tensor to be used in tvm.lower or tvm.build.

print_best(log_file, print_mode='schedule')[源代码]#

Print the best schedule as python schedule API code or CUDA source code.

Parameters#

log_filestr

The name of the log file

print_mode: str

if “schedule”, print the best schedule as python schedule API code. if “cuda”, print the best schedule as CUDA source code.

Returns#

code: str

The best schedule code in python API or CUDA source code

tune(tuning_options, search_policy=None, adaptive_training=False)[源代码]#

Run auto scheduling search for a task

Parameters#

tuning_optionsTuningOptions

Tuning and measurement options.

search_policyOptional[SearchPolicy]

The search policy to be used for schedule search.

class tvm.auto_scheduler.SketchPolicy(task, program_cost_model=auto_scheduler.RandomModel(0x5637198f9808), params=None, seed=None, verbose=1, init_search_callbacks=None)[源代码]#

The search policy that searches in a hierarchical search space defined by sketches. The policy randomly samples programs from the space defined by sketches and use evolutionary search to fine-tune them.

Parameters#

taskSearchTask

The SearchTask for the computation declaration.

program_cost_modelCostModel = RandomModel()

The cost model to estimate the complete schedules.

paramsOptional[Dict[str, Any]]

Parameters of the search policy. See src/auto_scheduler/search_policy/sketch_search_policy.h for the definitions. See DEFAULT_PARAMS below to find the default values.

seedOptional[int]

Random seed.

verboseint = 1

Verbosity level. 0 for silent, 1 to output information during schedule search.

init_search_callbacksOptional[List[SearchCallback]]

Callback functions called before the search process, usually used to do extra initializations. Possible callbacks:

  • auto_scheduler.PreloadMeasuredStates

  • auto_scheduler.PreloadCustomSketchRule

Methods:

evolutionary_search(init_populations, out_size)

Perform evolutionary search.

generate_sketches([print_for_debug])

Generate the sketches.

sample_initial_population()

Sample initial population.

Perform evolutionary search. This python interface is mainly used for debugging and testing. The actual search is all done in c++.

Parameters#

init_populations: List[State]

The initial population states

out_sizeint

The size of generated states

Returns#

states: List[State]

The generated states

generate_sketches(print_for_debug=False)[源代码]#

Generate the sketches. This python interface is mainly used for debugging and testing. The actual search is all done in c++.

Parameters#

print_for_debugbool = False

Whether print out the sketches for debug.

Returns#

sketchesList[State]

The generated sketches of this search task.

sample_initial_population()[源代码]#

Sample initial population. This python interface is mainly used for debugging and testing. The actual search is all done in c++.

Returns#

states: List[State]

The sampled states

class tvm.auto_scheduler.TaskScheduler(tasks, task_weights=None, objective_func=None, strategy='gradient', load_model_file=None, load_log_file=None, alpha=0.2, beta=2, gamma=0.5, backward_window_size=3, callbacks=None)[源代码]#

Allocate the time resources when tuning multiple tasks together. This implements two strategies: “round-robin” and “gradient”.

Parameters#

tasks: List[SearchTask]

All tasks to tune

task_weights: Optional[List[float]]

The weights of tasks. If provided, the task scheduler will set the objective function to sum(weight[t] * latency[t]), where weight[t] is the weight of a task and the lantecy[t] is the lantecy of the task. If not provided, the task scheduer will assign equal weights to all tasks (i.e., the objective function is sum(latency[t])).

objective_func: Optional[Callable[List[float] -> float]]

The objective function to be minimized. The objective function accepts the current latencies of all tasks and returns the objective. If not provided, the objective is the weighted sum of the latencies of all tasks.

strategy: str = “gradient”

The scheduling strategy. “round-robin”: Tune tasks in round robin order. “gradient” : Tune tasks with gradient descent.

load_model_file: Optional[str]

Load pre-trained model from this file. If this is None, the cost model will be trained from scratch.

load_log_file: Optional[str]

Load measurement records from this file. If it is not None, the status of the task scheduler, search policies and cost models will be restored according to this file.

verbose: int = 1

The level of verbosity. 0 means silent.

alpha: float = 0.2

The parameter used for ‘gradient’ strategy

beta: float = 2

The parameter used for ‘gradient’ strategy

backward_window_size: int = 3

The parameter used for ‘gradient’ strategy

callbacks: Optional[List[TaskSchedulerCallback]]

The task scheduler callbacks that will be called before and after tuning a task. If None, PrintTableInfo and LogEstimatedLatency callback will be used.

Methods:

_adjust_similarity_group(task_idx)

adjust the similarity group for the selected task

_compute_score(costs)

compute the objective function

_restore_status(log_file, num_measures_per_round)

restore task_cts and best_costs from a log file

_tune_task(task_idx)

Tune the select task for one round

tune(tune_option[, search_policy, ...])

Tune a batch of tasks together.

_adjust_similarity_group(task_idx)[源代码]#

adjust the similarity group for the selected task

_compute_score(costs)[源代码]#

compute the objective function

_restore_status(log_file, num_measures_per_round)[源代码]#

restore task_cts and best_costs from a log file

_tune_task(task_idx)[源代码]#

Tune the select task for one round

tune(tune_option, search_policy='default', search_policy_params=None, adaptive_training=False, per_task_early_stopping=None)[源代码]#

Tune a batch of tasks together.

Parameters#

tune_option: TuningOptions

The tuning options applied to all tasks.

search_policy:Union[str, List[SearchPolicy]] = “default”

The list of search policies. If it is str, “default” for the default policy (SketchPolicy + XGBModel), “sketch.xgb” for SketchPolicy + XGBModel, “sketch.random” for SketchPolicy + RandomModel.

search_policy_paramsOptional[Dict[str, Any]]

The parameters of the search policy

adaptive_trainingbool = False

Option used by XGBModel to reduce the model training frequency when there’re too many logs.

per_task_early_stoppingOptional[int]

Stop tuning a task early if getting no improvement after n measurements.

参数:
class tvm.auto_scheduler.TuningOptions(num_measure_trials=0, early_stopping=None, num_measures_per_round=64, verbose=1, builder='local', runner='local', measure_callbacks=None)[源代码]#

This controls the options of performance tuning.

Parameters#

num_measure_trials: int = 0

The number of measurement trials. The search policy measures num_measure_trials schedules in total and returns the best one among them. With num_measure_trials == 0, the policy will do the schedule search but won’t involve measurement. This can be used to get a runnable schedule quickly without auto-tuning.

early_stopping: Optional[int]

Stop the tuning early if getting no improvement after n measurements.

num_measures_per_round: int = 64

The number of schedules to be measured at each search round. The whole schedule search process will try a total number of num_measure_trials in several rounds.

verbose: int = 1

Verbosity level. 0 for silent, 1 to output information during schedule search.

builder: Union[ProgramBuilder, str] = ‘local’

ProgramBuilder which builds the program.

runner: Union[ProgramRunner, str] = ‘local’

ProgramRunner which runs the program and measures time costs.

measure_callbacks: Optional[List[MeasureCallback]]

Callback functions called after each measurement. Candidates: - auto_scheduler.RecordToFile

class tvm.auto_scheduler.XGBModel(verbose_eval=25, num_warmup_sample=100, seed=None, model_file=None, adaptive_training=False)[源代码]#

Train a XGBoost model to predict the normalized throughputs of programs. Let the normalized throughput be the score of a program (higher is better). We predict the (approximate) score of a program = the sum of the scores of all stages in this program. i.e. score(P) = score_s0 + score_s1 + … + score_sn, where score_si is the score of Stage i in Program P. We extract feature for each stage and let the xgboost predict the score for each stage. We then sum up the predictions as the score of the whole program. We use RMSE as the loss function. i.e. loss(P, y) = 1/2 * (score(P) - y)^2, where P is the program and y is the normalized throughput according to the ground truth (measurement). XGBoost does not support this loss function because score(P) is a sum of the prediction of several samples, so we implemented a custom loss function and call it pack-sum-rmse. It is called “pack-sum” because we combine several samples into a “pack” and sum up their predictions.

Parameters#

verbose_eval: int = 25

Print training log every verbose_eval iterations.

num_warmup_sample: int = 100

The minimum number of samples to start to use the trained model. If the number of samples is less than this number, the model outputs random predictions.

seed: Optional[int]

The random seed

model_file: Optional[str]

If is not None, save model to this file after every update.

adaptive_training: bool = False

Whether to use adaptive training, which reduces the training frequency when there are too many logs.

Methods:

load(file_name)

Load the model from a file Parameters ---------- file_name: str The filename

predict(task, states)

Predict the scores of states Parameters ---------- search_task : SearchTask The search task of states statse : List[State] The input states Returns ------- scores: List[float] The predicted scores for all states

predict_stages(task, states)

Predict the scores of all stages in states.

save(file_name)

Save the model to a file Parameters ---------- file_name: str The filename

update(inputs, results)

Update the cost model according to new measurement results (training data). XGBoost does not support incremental training, so we re-train a new model every time. Parameters ---------- inputs : List[MeasureInput] The measurement inputs results : List[MeasureResult] The measurement results.

update_from_file(file_name[, n_lines])

Load measure records from a log file to update the cost model. This function can be used to pre-train the cost model with history log files. Parameters ---------- file_name: str The filename n_lines: Optional[int] Only load first n lines of the log file.

load(file_name)[源代码]#

Load the model from a file Parameters ———- file_name: str

The filename

参数:

file_name (str)

predict(task, states)[源代码]#

Predict the scores of states Parameters ———- search_task : SearchTask

The search task of states

statseList[State]

The input states

Returns#

scores: List[float]

The predicted scores for all states

predict_stages(task, states)[源代码]#

Predict the scores of all stages in states. This is the breakdown version of predict.

Parameters#

search_taskSearchTask

The search task of states

statseList[State]

The input states

Returns#

scores: List[float]

The predicted scores for all stages in all states in the packed format

Note#

For faster data copy between c++ and python, the python part returns scores in a single flatten array using a packed format. The c++ part then unpacks the flatten array. The packed format is: {

float scores[N]; // scores[i] is the score for states[i]. int n_stage_0; // the number of stages in states[0] float stage_scores_0[[n_stage_0] // the scores for all stages in states[0] int n_stage_1; // the number of stages in states[1] float stage_scores_1[n_stage_1]; // the scores for all stages in states[1] … int n_stage_i; // the number of stages in states[i] float stage_scores_1[n_stage_i]; // the scores for all stages in states[i] … // untill i == N - 1

} To implement this format, we also store int as float, so we can store all numbers into a single float array.

save(file_name)[源代码]#

Save the model to a file Parameters ———- file_name: str

The filename

参数:

file_name (str)

update(inputs, results)[源代码]#

Update the cost model according to new measurement results (training data). XGBoost does not support incremental training, so we re-train a new model every time. Parameters ———- inputs : List[MeasureInput]

The measurement inputs

resultsList[MeasureResult]

The measurement results

update_from_file(file_name, n_lines=None)[源代码]#

Load measure records from a log file to update the cost model. This function can be used to pre-train the cost model with history log files. Parameters ———- file_name: str

The filename

n_lines: Optional[int]

Only load first n lines of the log file

tvm.auto_scheduler.auto_schedule(task, search_policy=None, tuning_options=auto_scheduler.TuningOptions(0x5637198979b0))[源代码]#

THIS API IS DEPRECATED.

Run auto scheduling search for a task.

Parameters#

taskSearchTask

The SearchTask for the computation declaration.

search_policyOptional[SearchPolicy]

The search policy to be used for schedule search.

tuning_optionsOptional[TuningOptions]

Tuning and measurement options.

Returns#

A te.Schedule and the a list of te.Tensor to be used in tvm.lower or tvm.build.

tvm.auto_scheduler.create_task(func, args, target, target_host=None, hardware_params=None)[源代码]#

THIS API IS DEPRECATED.

Create a search task.

Parameters#

funcUnion[Function, str]

The function that returns the compute declaration Tensors. Can be the a function or the function name.

argsUnion[Tuple[Any, …], List[Any]]

The args of the function.

targetUnion[tvm.target.Target, str]

The target device of this search task.

target_hostOptional[Union[tvm.target.Target, str]]

The target host device of this search task.

hardware_paramsOptional[HardwareParams]

Hardware parameters used in this search task.

Returns#

SearchTask: the created task

tvm.auto_scheduler.extract_tasks(mod, params, target, target_host=None, hardware_params=None, include_simple_tasks=False, dump_workload_to_dag_log=None, opt_level=3, other_targets=None)[源代码]#

Extract tuning tasks from a relay program.

Parameters#

mod: tvm.IRModule or relay.function.Function

The module or function to tune

params: dict of str to numpy array

The associated parameters of the program

target: Union[tvm.target.Target, str]

The compilation target

target_host: Optional[Union[tvm.target.Target, str]]

The host compilation target

hardware_paramsOptional[HardwareParams]

Hardware parameters used for the search tasks

include_simple_tasks: bool

Whether to extract simple tasks that do not include complicated ops.

dump_workload_to_dag_log: Optional[str]

A file to dump an association between the workload keys and the actual DAG

opt_levelOptional[int]

The optimization level of the task extractions.

other_targets: Optional[List[tvm.target.Target]]

Other targets for call_all_topi_funcs, e.g., cutlass target.

Returns#

tasks: List[SearchTask]

The tasks in this network

weights: List[int]

The weight (i.e. the number of appearance) of extracted tasks

tvm.auto_scheduler.get_shape_from_rewritten_layout(rewritten_layout, axis_names)[源代码]#

Get the orginal shape from a rewritten layout string.

Parameters#

rewritten_layout: str

The layout after rewrite

axis_names: List[str]

Specify the order of axes by names

Returns#

shape: List[PrimExpr]

The original shape

tvm.auto_scheduler.is_auto_scheduler_enabled()[源代码]#

Return whether the auto-scheduler is enabled.

Parameters#

enabled: bool

Whether the auto-scheduler is enabled

tvm.auto_scheduler.load_best_record(filename, workload_key=None, target=None, include_compatible=False)[源代码]#

Return the best measurement pair form a log file. This may return none results if there is no legal measure pair with the specified workload_key/target found from the log file.

Parameters#

filenamestr

File name to load log from.

workload_keyOptional[str]

The workload key of the compute declaration. With None, this returns the best measure pair of all workloads.

targetOptional[tvm.target.Target]

The target device. With None, this returns the best measure pair of all target devices.

include_compatible: bool

When set to True, all compatible records in the log file will be considered.

Returns#

inputauto_scheduler.measure.MeasureInput

The best State’s MeasureInput from this log fine.

resultauto_scheduler.measure.MeasureResult

The best State’s MeasureResult from this log fine.

tvm.auto_scheduler.load_records(filename)[源代码]#

Load measurement records from a file.

Parameters#

filenamestr

File name to load log from.

Returns#

logs : List[auto_scheduler.measure.MeasureInput, auto_scheduler.measure.MeasureResult]

Notes#

Some unimportant and expensive fields in the returned MeasureInput are not deserialized for faster read speed (e.g., input.task.compute_dag, input.state.stages). If you want to use them, you can call the recover_measure_input below to rebuild these fields.

tvm.auto_scheduler.make_workload_key(func, args)[源代码]#

Make a workload key by function and arguments.

Parameters#

funcUnion[Function, str]

The function that returns the compute declaration Tensors. Can be the a function or the function name.

argsArgs

The args of the function.

Returns#

workload_keystr

The workload key of the function.

tvm.auto_scheduler.register_task_input_check_func(func_name, f=None, override=False)[源代码]#

Register a function that checks the input buffer map.

The input function should take a list of Tensor wich indicate the Input/output Tensor of a TVM subgraph and return a Map from the input Tensor to its buffer name.

Parameters#

func_nameUnion[Function, str]

The check function that returns the compute declaration Tensors or its function name.

fOptional[Function]

The check function to be registered.

overrideboolean = False

Whether to override existing entry.

Examples#

@auto_scheduler.register_task_input_check_func
def check_task_input_by_placeholder_name(args : List[Tensor]):
    tensor_input_map = {}
    for arg in args:
        if isinstance(arg.op, tvm.te.PlaceholderOp):
            if arg.op.name != "placeholder":
                tensor_input_map[arg] = arg.op.name
    return tensor_input_map
tvm.auto_scheduler.register_workload(func_name, f=None, override=False)[源代码]#

Register a function that generates a certain workload.

The input function should take hashable and jsonable arguments (int, float, tuple of int, tvm.tensor.Tensor, …) and return a list of tvm.tensor.Tensor.

Parameters#

func_nameUnion[Function, str]

The generation function that returns the compute declaration Tensors or its function name.

fOptional[Function]

The generation function to be registered.

overrideboolean = False

Whether to override existing entry.

Examples#

@auto_scheduler.register_workload
def matmul(N, M, K):
    A = te.placeholder((N, K), name='A')
    B = te.placeholder((K, M), name='B')
    k = te.reduce_axis((0, K), name='k')
    C = te.compute((N, M), lambda i, j: te.sum(A[i][k] * B[k][j], axis=[k]), name='C')
    return [A, B, C]
tvm.auto_scheduler.remove_index_check(tensor)[源代码]#

Remove the safety check in the indexing function for a tensor. This is done by monkey patching its indexing function. After removing the check, we are allowed to create a temporary wrong IR and fix it later in other places.

Parameters#

tensor: Tensor

The tensor to remove index check.

tvm.auto_scheduler.rewrite_compute_body(compute_tensor, new_layout)[源代码]#

Rewrite the body of a ComputeOp according to a new layout of a placeholder

tvm.auto_scheduler.rewrite_tensor_shape(tensor, shape)[源代码]#

Rewrite the tensor shape

tvm.auto_scheduler.save_records(filename, inputs, results)[源代码]#

Append measure records to file.

Parameters#

filenamestr

File name to write log to.

inputs: List[MeasureInputs]

The MeasureInputs to be written.

results: List[MeasureResults]

The MeasureResults to be written.