mxnet.kvstore.horovod¶
Key value store interface of MXNet for Horovod
Classes
|
A communication backend using Horovod. |
- class mxnet.kvstore.horovod.Horovod[source]¶
Bases:
KVStoreBaseA communication backend using Horovod.
- broadcast(key, value, out, priority=0)[source]¶
Broadcast the value NDArray at rank 0 to all ranks
- Parameters:
key (str, or int) – The key is used to name the tensor for allreduce. Its usage is different from that of parameter servers.
value (NDArray) – The tensor that is to be broadcasted.
out (NDArray, list of NDArray) – Output tensor that receives value broadcasted from root process
priority (int, optional) – The priority of the operation. Higher priority operations are likely to be executed before other actions.
Examples
>>> a = mx.nd.ones(shape) >>> b = mx.nd.zeros(shape) >>> kv.broadcast('2', value=a, out=b) >>> print(b.asnumpy) [[ 1. 1. 1.] [ 1. 1. 1.]]
- static is_capable(capability)[source]¶
Queries if the KVStore type supports certain capability, such as optimizer algorithm, gradient compression, sparsity, etc.
- load_optimizer_states(fname)[source]¶
Loads the optimizer (updater) state from the file.
- Parameters:
fname (str) – Path to input states file.
- property num_workers¶
Returns the number of worker nodes.
- Returns:
size – The number of worker nodes.
- Return type:
- pushpull(key, value, out=None, priority=0)[source]¶
Performs allreduce on a single tensor or a list of tensor objects
This function performs in-place summation of the input tensor over all the processes.
The name pushpull is a generic term. In Horovod, its action is implemented via ring allreduce. Each operation is identified by the ‘key’; if key is not provided, an incremented auto-generated name is used. The tensor type and shape must be the same on all processes for a given name. The reduction will not start until all processes are ready to send and receive the tensor.
- Parameters:
key (str, int, or sequence of str or int) – Keys used to uniquely tag an operation.
value (NDArray) – Tensor value on one process to be summed. If out is not specified, the value will be modified in-place
out (NDArray) – Output tensor after allreduce. If not specified, the input tensor value will be modified in-place.
priority (int, optional) – The priority of the operation. Higher priority operations are likely to be executed before other actions.
Examples
>>> # perform in-place allreduce on tensor a >>> shape = (2, 3) >>> nworker = kv.num_workers # assume there are 8 processes >>> a = mx.nd.ones(shape) >>> kv.pushpull('1', a) >>> print(a.asnumpy()) [[ 8. 8. 8.] [ 8. 8. 8.]]
>>> # perform allreduce on tensor a and output to b >>> a = mx.nd.ones(shape) >>> kv.pushpull('2', a, out=b) >>> print(b.asnumpy()) [[ 8. 8. 8.] [ 8. 8. 8.]]
- property rank¶
Returns the rank of this worker node.
- Returns:
rank – The rank of this node, which is in range [0, num_workers())
- Return type:
- save_optimizer_states(fname, dump_optimizer=False)[source]¶
Saves the optimizer (updater) state to a file. This is often used when checkpointing the model during training.
- set_optimizer(optimizer)[source]¶
Registers an optimizer with the kvstore.
When using a single machine, this function updates the local optimizer. If using multiple machines and this operation is invoked from a worker node, it will serialized the optimizer with pickle and send it to all servers. The function returns after all servers have been updated.
- Parameters:
optimizer (KVStoreBase) – The new optimizer for the store