mxnet.kvstore.horovod¶

Key value store interface of MXNet for Horovod

Classes

Horovod()

A communication backend using Horovod.

class mxnet.kvstore.horovod.Horovod[source]¶

Bases: KVStoreBase

A communication backend using Horovod.

broadcast(key, value, out, priority=0)[source]¶

Broadcast the value NDArray at rank 0 to all ranks

Parameters:

key (str, or int) – The key is used to name the tensor for allreduce. Its usage is different from that of parameter servers.
value (NDArray) – The tensor that is to be broadcasted.
out (NDArray, list of NDArray) – Output tensor that receives value broadcasted from root process
priority (int, optional) – The priority of the operation. Higher priority operations are likely to be executed before other actions.

Examples

>>> a = mx.nd.ones(shape)
>>> b = mx.nd.zeros(shape)
>>> kv.broadcast('2', value=a, out=b)
>>> print(b.asnumpy)
[[ 1.  1.  1.]
[ 1.  1.  1.]]

static is_capable(capability)[source]¶

Queries if the KVStore type supports certain capability, such as optimizer algorithm, gradient compression, sparsity, etc.

Parameters:: capability (str) – The capability to query
Returns:: result – Whether the capability is supported or not.
Return type:: bool

load_optimizer_states(fname)[source]¶

Loads the optimizer (updater) state from the file.

Parameters:: fname (str) – Path to input states file.

property num_workers¶

Returns the number of worker nodes.

Returns:: size – The number of worker nodes.
Return type:: int

pushpull(key, value, out=None, priority=0)[source]¶

Performs allreduce on a single tensor or a list of tensor objects

This function performs in-place summation of the input tensor over all the processes.

The name pushpull is a generic term. In Horovod, its action is implemented via ring allreduce. Each operation is identified by the ‘key’; if key is not provided, an incremented auto-generated name is used. The tensor type and shape must be the same on all processes for a given name. The reduction will not start until all processes are ready to send and receive the tensor.

Parameters:

key (str, int, or sequence of str or int) – Keys used to uniquely tag an operation.
value (NDArray) – Tensor value on one process to be summed. If out is not specified, the value will be modified in-place
out (NDArray) – Output tensor after allreduce. If not specified, the input tensor value will be modified in-place.
priority (int, optional) – The priority of the operation. Higher priority operations are likely to be executed before other actions.

Examples

>>> # perform in-place allreduce on tensor a
>>> shape = (2, 3)
>>> nworker = kv.num_workers # assume there are 8 processes
>>> a = mx.nd.ones(shape)
>>> kv.pushpull('1', a)
>>> print(a.asnumpy())
[[ 8.  8.  8.]
[ 8.  8.  8.]]

>>> # perform allreduce on tensor a and output to b
>>> a = mx.nd.ones(shape)
>>> kv.pushpull('2', a, out=b)
>>> print(b.asnumpy())
[[ 8.  8.  8.]
[ 8.  8.  8.]]

property rank¶

Returns the rank of this worker node.

Returns:: rank – The rank of this node, which is in range [0, num_workers())
Return type:: int

save_optimizer_states(fname, dump_optimizer=False)[source]¶

Saves the optimizer (updater) state to a file. This is often used when checkpointing the model during training.

Parameters:

fname (str) – Path to the output states file.
dump_optimizer (bool, default False) – Whether to also save the optimizer itself. This would also save optimizer information such as learning rate and weight decay schedules.

set_optimizer(optimizer)[source]¶

Registers an optimizer with the kvstore.

When using a single machine, this function updates the local optimizer. If using multiple machines and this operation is invoked from a worker node, it will serialized the optimizer with pickle and send it to all servers. The function returns after all servers have been updated.

Parameters:: optimizer (KVStoreBase) – The new optimizer for the store

property type¶

Returns the type of this kvstore backend.

Returns:: type – the string type
Return type:: str