From PyTorch DDP to Accelerate to Trainer, mastery of distributed training with ease

Zachary Mueller's avatar

This tutorial assumes you have a basic understanding of PyTorch and how to train a simple model. It will showcase training on multiple GPUs through a process called Distributed Data Parallelism (DDP) through three different levels of increasing abstraction:

  • Native PyTorch DDP through the pytorch.distributed module
  • Utilizing 🤗 Accelerate’s light wrapper around pytorch.distributed that also helps ensure the code can be run

     

     

     

    To finish reading, please visit source site