Simulating PPFL

This package provides users with the capability of simulating PPFL on either a single machine or a cluster.

Note

Running (either training or simulating) PPFL on multiple heterogeneous machines is described in Training PPFL.

We describe how to simulate PPFL with a given model and datasets. For simulation, we assume that test_data is available to validate the training.

Serial run

Serial runs begin simply by calling the following API function.

appfl.run_serial.run_serial(cfg: omegaconf.DictConfig, model: torch.nn.Module, loss_fn: torch.nn.Module, train_data: Dataset, test_data: Dataset = torch.utils.data.Dataset, dataset_name: str = 'appfl')[source]

Run serial simulation of PPFL.

Parameters:
  • cfg (DictConfig) – the configuration for this run

  • model (nn.Module) – neural network model to train

  • loss_fn (nn.Module) – loss function

  • train_data (Dataset) – training data

  • test_data (Dataset) – optional testing data. If given, validation will run based on this data.

  • dataset_name (str) – optional dataset name

Some remarks are made as follows:

Parallel run with MPI

We can parallelize the PPFL simulation by usinig MPI through mpi4py package. The following two API functions need to be called for parallelization.

appfl.run_mpi.run_server(cfg: omegaconf.DictConfig, comm: mpi4py.MPI.Comm, model: torch.nn.Module, loss_fn: torch.nn.Module, num_clients: int, test_dataset: Dataset = torch.utils.data.Dataset, dataset_name: str = 'appfl')[source]

Run PPFL simulation server that aggregates and updates the global parameters of model

Parameters:
  • cfg (DictConfig) – the configuration for this run

  • comm – MPI communicator

  • model (nn.Module) – neural network model to train

  • loss_fn (nn.Module) – loss function

  • num_clients (int) – the number of clients used in PPFL simulation

  • test_data (Dataset) – optional testing data. If given, validation will run based on this data.

  • DataSet_name (str) – optional dataset name

appfl.run_mpi.run_client(cfg: omegaconf.DictConfig, comm: mpi4py.MPI.Comm, model: torch.nn.Module, loss_fn: torch.nn.Module, num_clients: int, train_data: Dataset, test_data: Dataset = torch.utils.data.Dataset)[source]

Run PPFL simulation clients, each of which updates its own local parameters of model

Parameters:
  • cfg (DictConfig) – the configuration for this run

  • comm – MPI communicator

  • model (nn.Module) – neural network model to train

  • num_clients (int) – the number of clients used in PPFL simulation

  • train_data (Dataset) – training data

  • test_data (Dataset) – testing data

The server and the clients begin by run_server and run_client, respectively, where MPI communicator (e.g., MPI.COMM_WORLD in this example) is given as an argument.

Note

We assume that MPI process 0 runs the server, and the other processes run clients.

Note

mpiexec may need to specify additional argument to use CUDA: --mca opal_cuda_support 1