Simulating PPFL
This package provides users with the capability of simulating PPFL on either a single machine or a cluster.
Note
Running (either training or simulating) PPFL on multiple heterogeneous machines is described in Training PPFL.
We describe how to simulate PPFL with a given model and datasets. For simulation, we assume that test_data is available to validate the training.
Serial run
Serial runs begin simply by calling the following API function.
- appfl.run_serial.run_serial(cfg: omegaconf.DictConfig, model: torch.nn.Module, loss_fn: torch.nn.Module, train_data: Dataset, test_data: Dataset = torch.utils.data.Dataset, dataset_name: str = 'appfl')[source]
Run serial simulation of PPFL.
- Parameters:
cfg (DictConfig) – the configuration for this run
model (nn.Module) – neural network model to train
loss_fn (nn.Module) – loss function
train_data (Dataset) – training data
test_data (Dataset) – optional testing data. If given, validation will run based on this data.
dataset_name (str) – optional dataset name
Some remarks are made as follows:
Parameter
cfg: DictConfigreads the configuration of runs. See How to set configuration for details about configuration.Parameters
model,train_data, andtest_datashould be given by users; see User-defined model and User-defined dataset.
Parallel run with MPI
We can parallelize the PPFL simulation by usinig MPI through mpi4py package.
The following two API functions need to be called for parallelization.
- appfl.run_mpi.run_server(cfg: omegaconf.DictConfig, comm: mpi4py.MPI.Comm, model: torch.nn.Module, loss_fn: torch.nn.Module, num_clients: int, test_dataset: Dataset = torch.utils.data.Dataset, dataset_name: str = 'appfl')[source]
Run PPFL simulation server that aggregates and updates the global parameters of model
- Parameters:
cfg (DictConfig) – the configuration for this run
comm – MPI communicator
model (nn.Module) – neural network model to train
loss_fn (nn.Module) – loss function
num_clients (int) – the number of clients used in PPFL simulation
test_data (Dataset) – optional testing data. If given, validation will run based on this data.
DataSet_name (str) – optional dataset name
- appfl.run_mpi.run_client(cfg: omegaconf.DictConfig, comm: mpi4py.MPI.Comm, model: torch.nn.Module, loss_fn: torch.nn.Module, num_clients: int, train_data: Dataset, test_data: Dataset = torch.utils.data.Dataset)[source]
Run PPFL simulation clients, each of which updates its own local parameters of model
The server and the clients begin by run_server and run_client, respectively, where MPI communicator (e.g., MPI.COMM_WORLD in this example) is given as an argument.
Note
We assume that MPI process 0 runs the server, and the other processes run clients.
Note
mpiexec may need to specify additional argument to use CUDA: --mca opal_cuda_support 1