tvm.runtime.disco#

TVM distributed runtime API.

class tvm.runtime.disco.DModule(dref: DRef, session: Session)[源代码]#

A Module in a Disco session.

class tvm.runtime.disco.DPackedFunc(dref: DRef, session: Session)[源代码]#

A PackedFunc in a Disco session.

class tvm.runtime.disco.DRef[源代码]#

An object that exists on all workers. The controller process assigns a unique "register id" to each object, and the worker process uses this id to refer to the object residing on itself.

debug_copy_from(worker_id: int, value: ndarray | NDArray) None[源代码]#

Copy an NDArray value to remote for debugging purposes.

参数:
  • worker_id (int) -- The id of the worker to be copied to.

  • value (Union[numpy.ndarray, NDArray]) -- The value to be copied.

debug_get_from_remote(worker_id: int) Any[源代码]#

Get the value of a DRef from a remote worker. It is only used for debugging purposes.

参数:

worker_id (int) -- The id of the worker to be fetched from.

返回:

value -- The value of the register.

返回类型:

object

class tvm.runtime.disco.ProcessSession(num_workers: int, num_groups: int = 1, entrypoint: str = 'tvm.exec.disco_worker')[源代码]#

A Disco session backed by pipe-based multi-processing.

class tvm.runtime.disco.Session[源代码]#

A Disco interactive session. It allows users to interact with the Disco command queue with various PackedFunc calling convention.

allgather(src: DRef, dst: DRef, in_group: bool = True) DRef[源代码]#

Perform an allgather operation on an array.

参数:
  • src (DRef) -- The array to be gathered from.

  • dst (DRef) -- The array to be gathered to.

  • in_group (bool) -- Whether the reduce operation performs globally or in group as default.

allreduce(src: DRef, dst: DRef, op: str = 'sum', in_group: bool = True) DRef[源代码]#

Perform an allreduce operation on an array.

参数:
  • array (DRef) -- The array to be reduced.

  • op (str = "sum") -- The reduce operation to be performed. Available options are: - "sum" - "prod" - "min" - "max" - "avg"

  • in_group (bool) -- Whether the reduce operation performs globally or in group as default.

broadcast(src: ndarray | NDArray, dst: DRef | None = None, in_group: bool = True) DRef[源代码]#

Broadcast an array to all workers

参数:
  • src (Union[np.ndarray, NDArray]) -- The array to be broadcasted.

  • dst (Optional[DRef]) -- The output array. If None, an array matching the shape and dtype of src will be allocated on each worker.

  • in_group (bool) -- Whether the broadcast operation performs globally or in group as default.

返回:

output_array -- The DRef containing the broadcasted data on all workers. If dst was provided, this return value is the same as dst. Otherwise, it is the newly allocated space.

返回类型:

DRef

broadcast_from_worker0(src: DRef, dst: DRef, in_group: bool = True) DRef[源代码]#

Broadcast an array from worker-0 to all other workers.

参数:
  • src (Union[np.ndarray, NDArray]) -- The array to be broadcasted.

  • dst (Optional[DRef]) -- The output array. If None, an array matching the shape and dtype of src will be allocated on each worker.

  • in_group (bool) -- Whether the broadcast operation performs globally or in group as default.

call_packed(func: DRef, *args) DRef[源代码]#

Call a PackedFunc on workers providing variadic arguments.

参数:
  • func (PackedFunc) -- The function to be called.

  • *args (various types) -- In the variadic arguments, the supported types include: - integers and floating point numbers; - DLDataType; - DLDevice; - str (std::string in C++); - DRef.

返回:

return_value -- The return value of the function call.

返回类型:

various types

备注

Examples of unsupported types: - NDArray, DLTensor,; - TVM Objects, including PackedFunc, Module and String.

copy_from_worker_0(host_array: NDArray, remote_array: DRef) None[源代码]#

Copy an NDArray from worker-0 to the controller-side NDArray.

参数:
  • host_array (numpy.ndarray) -- The array to be copied to worker-0.

  • remote_array (NDArray) -- The NDArray on worker-0.

copy_to_worker_0(host_array: NDArray, remote_array: DRef | None = None) DRef[源代码]#

Copy the controller-side NDArray to worker-0.

参数:
  • host_array (NDArray) -- The array to be copied to worker-0.

  • remote_array (Optiona[DRef]) -- The destination NDArray on worker-0.

返回:

output_array -- The DRef containing the copied data on worker0, and NullOpt on all other workers. If remote_array was provided, this return value is the same as remote_array. Otherwise, it is the newly allocated space.

返回类型:

DRef

empty(shape: Sequence[int], dtype: str, device: Device | None = None, worker0_only: bool = False, in_group: bool = True) DRef[源代码]#

Create an empty NDArray on all workers and attach them to a DRef.

参数:
  • shape (tuple of int) -- The shape of the NDArray.

  • dtype (str) -- The data type of the NDArray.

  • device (Optional[Device] = None) -- The device of the NDArray.

  • worker0_only (bool) -- If False (default), allocate an array on each worker. If True, only allocate an array on worker0.

  • in_group (bool) -- Take effective when worker0_only is True. If True (default), allocate an array on each first worker in each group. If False, only allocate an array on worker0 globally.

返回:

array -- The created NDArray.

返回类型:

DRef

gather_to_worker0(from_array: DRef, to_array: DRef, in_group: bool = True) None[源代码]#

Gather an array from all other workers to worker-0.

参数:
  • from_array (DRef) -- The array to be gathered from.

  • to_array (DRef) -- The array to be gathered to.

  • in_group (bool) -- Whether the gather operation performs globally or in group as default.

get_global_func(name: str) DRef[源代码]#

Get a global function on workers.

参数:

name (str) -- The name of the global function.

返回:

func -- The global packed function

返回类型:

DRef

import_python_module(module_name: str) None[源代码]#

Import a python module in each worker

This may be required before call

参数:

module_name (str) -- The python module name, as it would be used in a python import statement.

init_ccl(ccl: str, *device_ids)[源代码]#

Initialize the underlying communication collective library.

参数:
  • ccl (str) -- The name of the communication collective library. Currently supported libraries are: - nccl - rccl - mpi

  • *device_ids (int) -- The device IDs to be used by the underlying communication library.

load_vm_module(path: str, device: Device | None = None) DModule[源代码]#

Load a VM module from a file.

参数:
  • path (str) -- The path to the VM module file.

  • device (Optional[Device] = None) -- The device to load the VM module to. Default to the default device of each worker.

返回:

module -- The loaded VM module.

返回类型:

DModule

property num_workers: int#

Return the number of workers in the session

scatter(src: ndarray | NDArray, dst: DRef | None = None, in_group: bool = True) DRef[源代码]#

Scatter an array across all workers

参数:
  • src (Union[np.ndarray, NDArray]) -- The array to be scattered. The first dimension of this array, src.shape[0], must be equal to the number of workers.

  • dst (Optional[DRef]) -- The output array. If None, an array with compatible shape and the same dtype as src will be allocated on each worker.

  • in_group (bool) -- Whether the scatter operation performs globally or in group as default.

返回:

output_array -- The DRef containing the scattered data on all workers. If dst was provided, this return value is the same as dst. Otherwise, it is the newly allocated space.

返回类型:

DRef

scatter_from_worker0(from_array: DRef, to_array: DRef, in_group: bool = True) None[源代码]#

Scatter an array from worker-0 to all other workers.

参数:
  • src (Union[np.ndarray, NDArray]) -- The array to be scattered. The first dimension of this array, src.shape[0], must be equal to the number of workers.

  • dst (Optional[DRef]) -- The output array. If None, an array with compatible shape and the same dtype as src will be allocated on each worker.

  • in_group (bool) -- Whether the scatter operation performs globally or in group as default.

shutdown()[源代码]#

Shut down the Disco session

sync_worker_0() None[源代码]#

Synchronize the controller with worker-0, and it will wait until the worker-0 finishes executing all the existing instructions.

class tvm.runtime.disco.SocketSession(num_nodes: int, num_workers_per_node: int, num_groups: int, host: str, port: int)[源代码]#

A Disco session backed by socket-based multi-node communication.

class tvm.runtime.disco.ThreadedSession(num_workers: int, num_groups: int = 1)[源代码]#

A Disco session backed by multi-threading.