tvm.contrib.graph_executor#

Minimum graph executor that executes graph containing TVM PackedFunc.

class tvm.contrib.graph_executor.GraphModule(module)[源代码]#

Wrapper runtime module.

This is a thin wrapper of the underlying TVM module. you can also directly call set_input, run, and get_output of underlying module functions

参数:

module (tvm.runtime.Module) -- The internal tvm module that holds the actual graph functions.

module#

The internal tvm module that holds the actual graph functions.

Type:

tvm.runtime.Module

示例

import tvm
from tvm import relay
from tvm.contrib import graph_executor

# build the library using graph executor
lib = relay.build(...)
lib.export_library("compiled_lib.so")
# load it back as a runtime
lib: tvm.runtime.Module = tvm.runtime.load_module("compiled_lib.so")
# Call the library factory function for default and create
# a new runtime.Module, wrap with graph module.
gmod = graph_executor.GraphModule(lib["default"](dev))
# use the graph module.
gmod.set_input("x", data)
gmod.run()
benchmark(device, func_name='run', repeat=5, number=5, min_repeat_ms=None, limit_zero_time_iterations=100, end_to_end=False, cooldown_interval_ms=0, repeats_to_cooldown=1, **kwargs)[源代码]#

Calculate runtime of a function by repeatedly calling it.

Use this function to get an accurate measurement of the runtime of a function. The function is run multiple times in order to account for variability in measurements, processor speed or other external factors. Mean, median, standard deviation, min and max runtime are all reported. On GPUs, CUDA and ROCm specifically, special on-device timers are used so that synchonization and data transfer operations are not counted towards the runtime. This allows for fair comparison of runtimes across different functions and models. The end_to_end flag switches this behavior to include data transfer operations in the runtime.

The benchmarking loop looks approximately like so:

for r in range(repeat):
    time_start = now()
    for n in range(number):
        func_name()
    time_end = now()
    total_times.append((time_end - time_start)/number)
参数:
  • func_name (str) -- The function to benchmark. This is ignored if end_to_end is true.

  • repeat (int) -- Number of times to run the outer loop of the timing code (see above). The output will contain repeat number of datapoints.

  • number (int) -- Number of times to run the inner loop of the timing code. This inner loop is run in between the timer starting and stopping. In order to amortize any timing overhead, number should be increased when the runtime of the function is small (less than a 1/10 of a millisecond).

  • min_repeat_ms (Optional[int]) -- If set, the inner loop will be run until it takes longer than min_repeat_ms milliseconds. This can be used to ensure that the function is run enough to get an accurate measurement.

  • limit_zero_time_iterations (Optional[int]) -- The maximum number of repeats when measured time is equal to 0. It helps to avoid hanging during measurements.

  • end_to_end (bool) -- If set, include time to transfer input tensors to the device and time to transfer returned tensors in the total runtime. This will give accurate timings for end to end workloads.

  • cooldown_interval_ms (Optional[int]) -- The cooldown interval in milliseconds between the number of repeats defined by repeats_to_cooldown.

  • repeats_to_cooldown (Optional[int]) -- The number of repeats before the cooldown is activated.

  • kwargs (Dict[str, Object]) -- Named arguments to the function. These are cached before running timing code, so that data transfer costs are not counted in the runtime.

返回:

timing_results -- Runtimes of the function. Use .mean to access the mean runtime, use .results to access the individual runtimes (in seconds).

返回类型:

BenchmarkResult

debug_get_output(node, out)[源代码]#

Run graph up to node and get the output to out

参数:
  • node (int / str) -- The node index or name

  • out (NDArray) -- The output array container

get_input(index, out=None)[源代码]#

Get index-th input to out

参数:
  • index (int) -- The input index

  • out (NDArray) -- The output array container

get_input_index(name)[源代码]#

Get inputs index via input name.

参数:

name (str) -- The input key name

返回:

index -- The input index. -1 will be returned if the given input name is not found.

返回类型:

int

get_input_info()[源代码]#

Return the 'shape' and 'dtype' dictionaries of the graph.

备注

We can't simply get the input tensors from a TVM graph because weight tensors are treated equivalently. Therefore, to find the input tensors we look at the 'arg_nodes' in the graph (which are either weights or inputs) and check which ones don't appear in the params (where the weights are stored). These nodes are therefore inferred to be input tensors.

返回:

  • shape_dict (Map) -- Shape dictionary - {input_name: tuple}.

  • dtype_dict (Map) -- dtype dictionary - {input_name: dtype}.

get_num_inputs()[源代码]#

Get the number of inputs to the graph

返回:

count -- The number of inputs.

返回类型:

int

get_num_outputs()[源代码]#

Get the number of outputs from the graph

返回:

count -- The number of outputs.

返回类型:

int

get_output(index, out=None)[源代码]#

Get index-th output to out

参数:
  • index (int) -- The output index

  • out (NDArray) -- The output array container

get_output_index(name)[源代码]#

Get outputs index via output name.

参数:

name (str) -- The output key name

返回:

index -- The output index. -1 will be returned if the given output name is not found.

返回类型:

int

get_output_info()[源代码]#

Return the 'shape' and 'dtype' dictionaries of the graph.

返回:

  • shape_dict (Map) -- Shape dictionary - {output_name: tuple}.

  • dtype_dict (Map) -- dtype dictionary - {output_name: dtype}.

load_params(params_bytes)[源代码]#

Load parameters from serialized byte array of parameter dict.

参数:

params_bytes (bytearray) -- The serialized parameter dict.

run(**input_dict)[源代码]#

Run forward execution of the graph

参数:

input_dict (dict of str to NDArray) -- List of input values to be feed to

set_input(key=None, value=None, **params)[源代码]#

Set inputs to the module via kwargs

参数:
  • key (int or str) -- The input key

  • value (the input value.) -- The input value

  • params (dict of str to NDArray) -- Additional arguments

set_input_zero_copy(key=None, value=None, **params)[源代码]#

Set inputs to the module via kwargs with zero memory copy

参数:
  • key (int or str) -- The input key

  • value (the input value in DLPack) -- The input value

  • params (dict of str to NDArray) -- Additional arguments

set_output_zero_copy(key, value)[源代码]#

Set outputs to the module with zero memory copy

参数:
  • key (int or str) -- The output key

  • value (the output value in DLPack) -- The output value

share_params(other, params_bytes)[源代码]#

Share parameters from pre-existing GraphExecutor instance.

参数:
  • other (GraphExecutor) -- The parent GraphExecutor from which this instance should share it's parameters.

  • params_bytes (bytearray) -- The serialized parameter dict (used only for the parameter names).

tvm.contrib.graph_executor.create(graph_json_str, libmod, device)[源代码]#

Create a runtime executor module given a graph and module.

参数:
  • graph_json_str (str) -- The graph to be deployed in json format output by json graph. The graph can contain operator(tvm_op) that points to the name of PackedFunc in the libmod.

  • libmod (tvm.runtime.Module) -- The module of the corresponding function

  • device (Device or list of Device) -- The device to deploy the module. It can be local or remote when there is only one Device. Otherwise, the first device in the list will be used as this purpose. All device should be given for heterogeneous execution.

返回:

graph_module -- Runtime graph module that can be used to execute the graph.

返回类型:

GraphModule

备注

See also tvm.contrib.graph_executor.GraphModule for examples to directly construct a GraphModule from an exported relay compiled library.

tvm.contrib.graph_executor.get_device(libmod, device)[源代码]#

Parse and validate all the device(s).

参数:
  • libmod (tvm.runtime.Module) -- The module of the corresponding function

  • device (Device or list of Device)

返回:

  • device (list of Device)

  • num_rpc_dev (Number of rpc devices)

  • device_type_id (List of device type and device id)