tvm.runtime.disco#

TVM distributed runtime API.

class tvm.runtime.disco.DModule(dref, session)[源代码]#

A Module in a Disco session.

参数:
class tvm.runtime.disco.DPackedFunc(dref, session)[源代码]#

A PackedFunc in a Disco session.

参数:
class tvm.runtime.disco.DRef[源代码]#

An object that exists on all workers. The controller process assigns a unique "register id" to each object, and the worker process uses this id to refer to the object residing on itself.

debug_copy_from(worker_id, value)[源代码]#

Copy an NDArray value to remote for debugging purposes.

Parameters#

worker_idint

The id of the worker to be copied to.

valueUnion[numpy.ndarray, NDArray]

The value to be copied.

参数:
返回类型:

None

debug_get_from_remote(worker_id)[源代码]#

Get the value of a DRef from a remote worker. It is only used for debugging purposes.

Parameters#

worker_idint

The id of the worker to be fetched from.

Returns#

valueobject

The value of the register.

参数:

worker_id (int)

返回类型:

Any

class tvm.runtime.disco.ProcessSession(num_workers, num_groups=1, entrypoint='tvm.exec.disco_worker')[源代码]#

A Disco session backed by pipe-based multi-processing.

参数:
  • num_workers (int)

  • num_groups (int)

  • entrypoint (str)

class tvm.runtime.disco.Session[源代码]#

A Disco interactive session. It allows users to interact with the Disco command queue with various PackedFunc calling convention.

_sync_worker(worker_id)[源代码]#

Synchronize the controller with a worker, and it will wait until the worker finishes executing all the existing instructions. This function is usually used for worker-0, because it is the only worker that is assumed to collocate with the controller. Syncing with other workers may not be supported and should only be used for debugging purposes.

Parameters#

worker_idint

The id of the worker to be synced with.

参数:

worker_id (int)

返回类型:

None

allgather(src, dst, in_group=True)[源代码]#

Perform an allgather operation on an array.

Parameters#

srcDRef

The array to be gathered from.

dstDRef

The array to be gathered to.

in_groupbool

Whether the reduce operation performs globally or in group as default.

参数:
返回类型:

DRef

allreduce(src, dst, op='sum', in_group=True)[源代码]#

Perform an allreduce operation on an array.

Parameters#

arrayDRef

The array to be reduced.

opstr = "sum"

The reduce operation to be performed. Available options are: - "sum" - "prod" - "min" - "max" - "avg"

in_groupbool

Whether the reduce operation performs globally or in group as default.

参数:
返回类型:

DRef

broadcast(src, dst=None, in_group=True)[源代码]#

Broadcast an array to all workers

Parameters#

src: Union[np.ndarray, NDArray]

The array to be broadcasted.

dst: Optional[DRef]

The output array. If None, an array matching the shape and dtype of src will be allocated on each worker.

in_group: bool

Whether the broadcast operation performs globally or in group as default.

Returns#

output_array: DRef

The DRef containing the broadcasted data on all workers. If dst was provided, this return value is the same as dst. Otherwise, it is the newly allocated space.

参数:
返回类型:

DRef

broadcast_from_worker0(src, dst, in_group=True)[源代码]#

Broadcast an array from worker-0 to all other workers.

Parameters#

src: Union[np.ndarray, NDArray]

The array to be broadcasted.

dst: Optional[DRef]

The output array. If None, an array matching the shape and dtype of src will be allocated on each worker.

in_group: bool

Whether the broadcast operation performs globally or in group as default.

参数:
返回类型:

DRef

call_packed(func, *args)[源代码]#

Call a PackedFunc on workers providing variadic arguments.

Parameters#

funcPackedFunc

The function to be called.

*argsvarious types

In the variadic arguments, the supported types include: - integers and floating point numbers; - DLDataType; - DLDevice; - str (std::string in C++); - DRef.

Returns#

return_valuevarious types

The return value of the function call.

Notes#

Examples of unsupported types: - NDArray, DLTensor,; - TVM Objects, including PackedFunc, Module and String.

参数:

func (DRef)

返回类型:

DRef

copy_from_worker_0(host_array, remote_array)[源代码]#

Copy an NDArray from worker-0 to the controller-side NDArray.

Parameters#

host_arraynumpy.ndarray

The array to be copied to worker-0.

remote_arrayNDArray

The NDArray on worker-0.

参数:
返回类型:

None

copy_to_worker_0(host_array, remote_array=None)[源代码]#

Copy the controller-side NDArray to worker-0.

Parameters#

host_arrayNDArray

The array to be copied to worker-0.

remote_arrayOptiona[DRef]

The destination NDArray on worker-0.

Returns#

output_array: DRef

The DRef containing the copied data on worker0, and NullOpt on all other workers. If remote_array was provided, this return value is the same as remote_array. Otherwise, it is the newly allocated space.

参数:
返回类型:

DRef

empty(shape, dtype, device=None, worker0_only=False, in_group=True)[源代码]#

Create an empty NDArray on all workers and attach them to a DRef.

Parameters#

shapetuple of int

The shape of the NDArray.

dtypestr

The data type of the NDArray.

deviceOptional[Device] = None

The device of the NDArray.

worker0_only: bool

If False (default), allocate an array on each worker. If True, only allocate an array on worker0.

in_group: bool

Take effective when worker0_only is True. If True (default), allocate an array on each first worker in each group. If False, only allocate an array on worker0 globally.

Returns#

arrayDRef

The created NDArray.

参数:
返回类型:

DRef

gather_to_worker0(from_array, to_array, in_group=True)[源代码]#

Gather an array from all other workers to worker-0.

Parameters#

from_arrayDRef

The array to be gathered from.

to_arrayDRef

The array to be gathered to.

in_group: bool

Whether the gather operation performs globally or in group as default.

参数:
返回类型:

None

get_global_func(name)[源代码]#

Get a global function on workers.

Parameters#

namestr

The name of the global function.

Returns#

funcDRef

The global packed function

参数:

name (str)

返回类型:

DRef

import_python_module(module_name)[源代码]#

Import a python module in each worker

This may be required before call

Parameters#

module_name: str

The python module name, as it would be used in a python import statement.

参数:

module_name (str)

返回类型:

None

init_ccl(ccl, *device_ids)[源代码]#

Initialize the underlying communication collective library.

Parameters#

cclstr

The name of the communication collective library. Currently supported libraries are: - nccl - rccl - mpi

*device_idsint

The device IDs to be used by the underlying communication library.

参数:

ccl (str)

load_vm_module(path, device=None)[源代码]#

Load a VM module from a file.

Parameters#

pathstr

The path to the VM module file.

deviceOptional[Device] = None

The device to load the VM module to. Default to the default device of each worker.

Returns#

moduleDModule

The loaded VM module.

参数:
  • path (str)

  • device (Device | None)

返回类型:

DModule

scatter(src, dst=None, in_group=True)[源代码]#

Scatter an array across all workers

Parameters#

src: Union[np.ndarray, NDArray]

The array to be scattered. The first dimension of this array, src.shape[0], must be equal to the number of workers.

dst: Optional[DRef]

The output array. If None, an array with compatible shape and the same dtype as src will be allocated on each worker.

in_group: bool

Whether the scatter operation performs globally or in group as default.

Returns#

output_array: DRef

The DRef containing the scattered data on all workers. If dst was provided, this return value is the same as dst. Otherwise, it is the newly allocated space.

参数:
返回类型:

DRef

scatter_from_worker0(from_array, to_array, in_group=True)[源代码]#

Scatter an array from worker-0 to all other workers.

Parameters#

src: Union[np.ndarray, NDArray]

The array to be scattered. The first dimension of this array, src.shape[0], must be equal to the number of workers.

dst: Optional[DRef]

The output array. If None, an array with compatible shape and the same dtype as src will be allocated on each worker.

in_group: bool

Whether the scatter operation performs globally or in group as default.

参数:
返回类型:

None

shutdown()[源代码]#

Shut down the Disco session

sync_worker_0()[源代码]#

Synchronize the controller with worker-0, and it will wait until the worker-0 finishes executing all the existing instructions.

返回类型:

None

property num_workers: int#

Return the number of workers in the session

class tvm.runtime.disco.SocketSession(num_nodes, num_workers_per_node, num_groups, host, port)[源代码]#

A Disco session backed by socket-based multi-node communication.

参数:
  • num_nodes (int)

  • num_workers_per_node (int)

  • num_groups (int)

  • host (str)

  • port (int)

class tvm.runtime.disco.ThreadedSession(num_workers, num_groups=1)[源代码]#

A Disco session backed by multi-threading.

参数:
  • num_workers (int)

  • num_groups (int)

__init__(num_workers, num_groups=1)[源代码]#

Create a disco session backed by multiple threads in the same process.

参数:
  • num_workers (int)

  • num_groups (int)

返回类型:

None