mxnet.kvstore.base¶
Key value store interface of MXNet for parameter synchronization.
Functions
|
Creates a new KVStore. |
Classes
An abstract key-value store interface for data parallel training. |
|
|
A key-value store for testing. |
- class mxnet.kvstore.base.KVStoreBase[source]¶
Bases:
objectAn abstract key-value store interface for data parallel training.
- broadcast(key, value, out, priority=0)[source]¶
Broadcast the value NDArray at rank 0 to all ranks, and store the result in out
- is_capable(capability)[source]¶
Queries if the KVStore type supports certain capability, such as optimizer algorithm, gradient compression, sparsity, etc.
- load_optimizer_states(fname)[source]¶
Loads the optimizer (updater) state from the file.
- Parameters:
fname (str) – Path to input states file.
- property num_workers¶
Returns the number of worker nodes.
- Returns:
size – The number of worker nodes.
- Return type:
- pushpull(key, value, out=None, priority=0)[source]¶
Performs push and pull a single value or a sequence of values from the store.
This function is coalesced form of push and pull operations.
value is pushed to the kvstore server for summation with the specified keys, and the results are pulled from the server to out. If out is not specified the pulled values are written to value.
Note that for allreduce based approaches such as horovod, there is no notion of server or store. This function performs allreduce.
- property rank¶
Returns the rank of this worker node.
- Returns:
rank – The rank of this node, which is in range [0, num_workers())
- Return type:
- static register(klass)[source]¶
Registers a new KVStore. Once a kvstore is registered, we can create an instance of this kvstore with create later.
Examples
>>> @mx.kvstore.KVStoreBase.register ... class MyKVStore(mx.kvstore.KVStoreBase): ... pass >>> kv = mx.kv.create('MyKVStore') >>> print(type(kv)) <class '__main__.MyKVStore'>
- save_optimizer_states(fname, dump_optimizer=False)[source]¶
Saves the optimizer (updater) state to a file. This is often used when checkpointing the model during training.
- set_optimizer(optimizer)[source]¶
Registers an optimizer with the kvstore.
When using a single machine, this function updates the local optimizer. If using multiple machines and this operation is invoked from a worker node, it will serialized the optimizer with pickle and send it to all servers. The function returns after all servers have been updated.
- Parameters:
optimizer (KVStoreBase) – The new optimizer for the store
- mxnet.kvstore.base.create(name='local')[source]¶
Creates a new KVStore.
For single machine training, there are two commonly used types:
local: Copies all gradients to CPU memory and updates weights there.device: Aggregates gradients and updates weights on GPUs. With this setting, the KVStore also attempts to use GPU peer-to-peer communication, potentially accelerating the communication.For distributed training, KVStore also supports a number of types:
dist_sync: Behaves similarly tolocalbut with one major difference. Withdist_sync, batch-size now means the batch size used on each machine. So if there arenmachines and we use batch sizeb, thendist_syncbehaves likelocalwith batch sizen * b.dist_device_sync: Identical todist_syncwith the difference similar todevicevslocal.dist_async: Performs asynchronous updates. The weights are updated whenever gradients are received from any machine. No two updates happen on the same weight at the same time. However, the order is not guaranteed.byteps: Use byteps as broadcast/pushpull backend. This kind of kvstore doesn’t store weights, thus there won’t be optimizer in this kvstore server. Byteps doesn’t support pure cpu training, so be sure to enable gpu training when using this kvstore.- Parameters:
name ({'local', 'device', 'nccl', 'dist_sync', 'dist_device_sync', 'dist_async', 'horovod', 'byteps'}) – The type of KVStore.
- Returns:
kv – The created KVStore.
- Return type: