Simulating PPFL

This package provides users with the capability of simulating PPFL on either a single machine or a cluster.

Note

Running (either training or simulating) PPFL on multiple heterogeneous machines is described in Training PPFL.

We describe how to simulate PPFL with a given model and datasets. For simulation, we assume that test_data is available to validate the training.

Serial run

Serial runs begin simply by calling the following API function.

appfl.run_serial.run_serial(cfg: omegaconf.DictConfig, model: torch.nn.Module, loss_fn: torch.nn.Module, train_data: Dataset, test_data: Dataset = torch.utils.data.Dataset, dataset_name: str = 'appfl')[source]

Run serial simulation of PPFL.

Parameters:

cfg (DictConfig) – the configuration for this run
model (nn.Module) – neural network model to train
loss_fn (nn.Module) – loss function
train_data (Dataset) – training data
test_data (Dataset) – optional testing data. If given, validation will run based on this data.
dataset_name (str) – optional dataset name

Some remarks are made as follows:

Parameter cfg: DictConfig reads the configuration of runs. See How to set configuration for details about configuration.
Parameters model, train_data, and test_data should be given by users; see User-defined model and User-defined dataset.

Parallel run with MPI

We can parallelize the PPFL simulation by usinig MPI through mpi4py package. The following two API functions need to be called for parallelization.

appfl.run_mpi.run_server(cfg: omegaconf.DictConfig, comm: mpi4py.MPI.Comm, model: torch.nn.Module, loss_fn: torch.nn.Module, num_clients: int, test_dataset: Dataset = torch.utils.data.Dataset, dataset_name: str = 'appfl')[source]

Run PPFL simulation server that aggregates and updates the global parameters of model

Parameters:

cfg (DictConfig) – the configuration for this run
comm – MPI communicator
model (nn.Module) – neural network model to train
loss_fn (nn.Module) – loss function
num_clients (int) – the number of clients used in PPFL simulation
test_data (Dataset) – optional testing data. If given, validation will run based on this data.
DataSet_name (str) – optional dataset name

appfl.run_mpi.run_client(cfg: omegaconf.DictConfig, comm: mpi4py.MPI.Comm, model: torch.nn.Module, loss_fn: torch.nn.Module, num_clients: int, train_data: Dataset, test_data: Dataset = torch.utils.data.Dataset)[source]

Run PPFL simulation clients, each of which updates its own local parameters of model

Parameters:

cfg (DictConfig) – the configuration for this run
comm – MPI communicator
model (nn.Module) – neural network model to train
num_clients (int) – the number of clients used in PPFL simulation
train_data (Dataset) – training data
test_data (Dataset) – testing data

The server and the clients begin by run_server and run_client, respectively, where MPI communicator (e.g., MPI.COMM_WORLD in this example) is given as an argument.

Note

We assume that MPI process 0 runs the server, and the other processes run clients.

Note

mpiexec may need to specify additional argument to use CUDA: --mca opal_cuda_support 1