tvm.runtime.disco#
TVM distributed runtime API.
- class tvm.runtime.disco.DRef[源代码]#
An object that exists on all workers. The controller process assigns a unique "register id" to each object, and the worker process uses this id to refer to the object residing on itself.
- class tvm.runtime.disco.ProcessSession(num_workers, num_groups=1, entrypoint='tvm.exec.disco_worker')[源代码]#
A Disco session backed by pipe-based multi-processing.
- class tvm.runtime.disco.Session[源代码]#
A Disco interactive session. It allows users to interact with the Disco command queue with various PackedFunc calling convention.
- _sync_worker(worker_id)[源代码]#
Synchronize the controller with a worker, and it will wait until the worker finishes executing all the existing instructions. This function is usually used for worker-0, because it is the only worker that is assumed to collocate with the controller. Syncing with other workers may not be supported and should only be used for debugging purposes.
Parameters#
- worker_idint
The id of the worker to be synced with.
- 参数:
worker_id (int)
- 返回类型:
None
- allgather(src, dst, in_group=True)[源代码]#
Perform an allgather operation on an array.
Parameters#
- srcDRef
The array to be gathered from.
- dstDRef
The array to be gathered to.
- in_groupbool
Whether the reduce operation performs globally or in group as default.
- allreduce(src, dst, op='sum', in_group=True)[源代码]#
Perform an allreduce operation on an array.
Parameters#
- arrayDRef
The array to be reduced.
- opstr = "sum"
The reduce operation to be performed. Available options are: - "sum" - "prod" - "min" - "max" - "avg"
- in_groupbool
Whether the reduce operation performs globally or in group as default.
- broadcast(src, dst=None, in_group=True)[源代码]#
Broadcast an array to all workers
Parameters#
- src: Union[np.ndarray, NDArray]
The array to be broadcasted.
- dst: Optional[DRef]
The output array. If None, an array matching the shape and dtype of src will be allocated on each worker.
- in_group: bool
Whether the broadcast operation performs globally or in group as default.
Returns#
output_array: DRef
The DRef containing the broadcasted data on all workers. If dst was provided, this return value is the same as dst. Otherwise, it is the newly allocated space.
- broadcast_from_worker0(src, dst, in_group=True)[源代码]#
Broadcast an array from worker-0 to all other workers.
Parameters#
- src: Union[np.ndarray, NDArray]
The array to be broadcasted.
- dst: Optional[DRef]
The output array. If None, an array matching the shape and dtype of src will be allocated on each worker.
- in_group: bool
Whether the broadcast operation performs globally or in group as default.
- call_packed(func, *args)[源代码]#
Call a PackedFunc on workers providing variadic arguments.
Parameters#
- funcPackedFunc
The function to be called.
- *argsvarious types
In the variadic arguments, the supported types include: - integers and floating point numbers; - DLDataType; - DLDevice; - str (std::string in C++); - DRef.
Returns#
- return_valuevarious types
The return value of the function call.
Notes#
Examples of unsupported types: - NDArray, DLTensor,; - TVM Objects, including PackedFunc, Module and String.
- copy_from_worker_0(host_array, remote_array)[源代码]#
Copy an NDArray from worker-0 to the controller-side NDArray.
Parameters#
- host_arraynumpy.ndarray
The array to be copied to worker-0.
- remote_arrayNDArray
The NDArray on worker-0.
- copy_to_worker_0(host_array, remote_array=None)[源代码]#
Copy the controller-side NDArray to worker-0.
Parameters#
- host_arrayNDArray
The array to be copied to worker-0.
- remote_arrayOptiona[DRef]
The destination NDArray on worker-0.
Returns#
output_array: DRef
The DRef containing the copied data on worker0, and NullOpt on all other workers. If remote_array was provided, this return value is the same as remote_array. Otherwise, it is the newly allocated space.
- empty(shape, dtype, device=None, worker0_only=False, in_group=True)[源代码]#
Create an empty NDArray on all workers and attach them to a DRef.
Parameters#
- shapetuple of int
The shape of the NDArray.
- dtypestr
The data type of the NDArray.
- deviceOptional[Device] = None
The device of the NDArray.
- worker0_only: bool
If False (default), allocate an array on each worker. If True, only allocate an array on worker0.
- in_group: bool
Take effective when worker0_only is True. If True (default), allocate an array on each first worker in each group. If False, only allocate an array on worker0 globally.
Returns#
- arrayDRef
The created NDArray.
- gather_to_worker0(from_array, to_array, in_group=True)[源代码]#
Gather an array from all other workers to worker-0.
Parameters#
- from_arrayDRef
The array to be gathered from.
- to_arrayDRef
The array to be gathered to.
- in_group: bool
Whether the gather operation performs globally or in group as default.
- get_global_func(name)[源代码]#
Get a global function on workers.
Parameters#
- namestr
The name of the global function.
Returns#
- funcDRef
The global packed function
- import_python_module(module_name)[源代码]#
Import a python module in each worker
This may be required before call
Parameters#
module_name: str
The python module name, as it would be used in a python import statement.
- 参数:
module_name (str)
- 返回类型:
None
- init_ccl(ccl, *device_ids)[源代码]#
Initialize the underlying communication collective library.
Parameters#
- cclstr
The name of the communication collective library. Currently supported libraries are: - nccl - rccl - mpi
- *device_idsint
The device IDs to be used by the underlying communication library.
- 参数:
ccl (str)
- load_vm_module(path, device=None)[源代码]#
Load a VM module from a file.
Parameters#
- pathstr
The path to the VM module file.
- deviceOptional[Device] = None
The device to load the VM module to. Default to the default device of each worker.
Returns#
- moduleDModule
The loaded VM module.
- scatter(src, dst=None, in_group=True)[源代码]#
Scatter an array across all workers
Parameters#
- src: Union[np.ndarray, NDArray]
The array to be scattered. The first dimension of this array, src.shape[0], must be equal to the number of workers.
- dst: Optional[DRef]
The output array. If None, an array with compatible shape and the same dtype as src will be allocated on each worker.
- in_group: bool
Whether the scatter operation performs globally or in group as default.
Returns#
output_array: DRef
The DRef containing the scattered data on all workers. If dst was provided, this return value is the same as dst. Otherwise, it is the newly allocated space.
- scatter_from_worker0(from_array, to_array, in_group=True)[源代码]#
Scatter an array from worker-0 to all other workers.
Parameters#
- src: Union[np.ndarray, NDArray]
The array to be scattered. The first dimension of this array, src.shape[0], must be equal to the number of workers.
- dst: Optional[DRef]
The output array. If None, an array with compatible shape and the same dtype as src will be allocated on each worker.
- in_group: bool
Whether the scatter operation performs globally or in group as default.
- class tvm.runtime.disco.SocketSession(num_nodes, num_workers_per_node, num_groups, host, port)[源代码]#
A Disco session backed by socket-based multi-node communication.