mxnet.kvstore.base¶

Key value store interface of MXNet for parameter synchronization.

Functions

create([name])

Creates a new KVStore.

Classes

`KVStoreBase`()	An abstract key-value store interface for data parallel training.
`TestStore`()	A key-value store for testing.

class mxnet.kvstore.base.KVStoreBase[source]¶

Bases: object

An abstract key-value store interface for data parallel training.

broadcast(key, value, out, priority=0)[source]¶

Broadcast the value NDArray at rank 0 to all ranks, and store the result in out

Parameters:

key (str or int) – The key.
value (NDArray) – The value corresponding to the key to broadcast
out (NDArray, or list of NDArray) – Values corresponding to the key to store the result
priority (int, optional) – The priority of the operation. Higher priority operations are likely to be executed before other actions.

is_capable(capability)[source]¶

Queries if the KVStore type supports certain capability, such as optimizer algorithm, gradient compression, sparsity, etc.

Parameters:: capability (str) – The capability to query
Returns:: result – Whether the capability is supported or not.
Return type:: bool

load_optimizer_states(fname)[source]¶

Loads the optimizer (updater) state from the file.

Parameters:: fname (str) – Path to input states file.

property num_workers¶

Returns the number of worker nodes.

Returns:: size – The number of worker nodes.
Return type:: int

pushpull(key, value, out=None, priority=0)[source]¶

Performs push and pull a single value or a sequence of values from the store.

This function is coalesced form of push and pull operations.

value is pushed to the kvstore server for summation with the specified keys, and the results are pulled from the server to out. If out is not specified the pulled values are written to value.

Note that for allreduce based approaches such as horovod, there is no notion of server or store. This function performs allreduce.

Parameters:

key (str or int) – The key.
value (NDArray, or list of NDArray) – Values corresponding to the keys.
out (NDArray, or list of NDArray) – Values corresponding to the key.
priority (int, optional) – The priority of the operation. Higher priority operations are likely to be executed before other actions.

property rank¶

Returns the rank of this worker node.

Returns:: rank – The rank of this node, which is in range [0, num_workers())
Return type:: int

static register(klass)[source]¶

Registers a new KVStore. Once a kvstore is registered, we can create an instance of this kvstore with create later.

Examples

>>> @mx.kvstore.KVStoreBase.register
... class MyKVStore(mx.kvstore.KVStoreBase):
...     pass
>>> kv = mx.kv.create('MyKVStore')
>>> print(type(kv))
<class '__main__.MyKVStore'>

save_optimizer_states(fname, dump_optimizer=False)[source]¶

Saves the optimizer (updater) state to a file. This is often used when checkpointing the model during training.

Parameters:

fname (str) – Path to the output states file.
dump_optimizer (bool, default False) – Whether to also save the optimizer itself. This would also save optimizer information such as learning rate and weight decay schedules.

set_optimizer(optimizer)[source]¶

Registers an optimizer with the kvstore.

When using a single machine, this function updates the local optimizer. If using multiple machines and this operation is invoked from a worker node, it will serialized the optimizer with pickle and send it to all servers. The function returns after all servers have been updated.

Parameters:: optimizer (KVStoreBase) – The new optimizer for the store

property type¶

Returns the type of this kvstore backend.

Returns:: type – the string type
Return type:: str

mxnet.kvstore.base.create(name='local')[source]¶

Creates a new KVStore.

For single machine training, there are two commonly used types:

local: Copies all gradients to CPU memory and updates weights there.

device: Aggregates gradients and updates weights on GPUs. With this setting, the KVStore also attempts to use GPU peer-to-peer communication, potentially accelerating the communication.

For distributed training, KVStore also supports a number of types:

dist_sync: Behaves similarly to local but with one major difference. With dist_sync, batch-size now means the batch size used on each machine. So if there are n machines and we use batch size b, then dist_sync behaves like local with batch size n * b.

dist_device_sync: Identical to dist_sync with the difference similar to device vs local.

dist_async: Performs asynchronous updates. The weights are updated whenever gradients are received from any machine. No two updates happen on the same weight at the same time. However, the order is not guaranteed.

byteps: Use byteps as broadcast/pushpull backend. This kind of kvstore doesn’t store weights, thus there won’t be optimizer in this kvstore server. Byteps doesn’t support pure cpu training, so be sure to enable gpu training when using this kvstore.

Parameters:: name ({'local', 'device', 'nccl', 'dist_sync', 'dist_device_sync', 'dist_async', 'horovod', 'byteps'}) – The type of KVStore.
Returns:: kv – The created KVStore.
Return type:: KVStoreBase