A Hugging Face Accelerate Story of Multiple Backends: FSDP and DeepSpeed

There are two popular implementations of the ZeRO Redundancy Optimizer (Zero) algorithm in the community, one from DeepSpeed and the other from PyTorch. Hugging Face Accelerate exposes both these frameworks for the end users to train/tune their models. This blog highlights the differences between how these backends are exposed through Accelerate. To enable users to seamlessly switch between these backends, we upstreamed a precision-related change and a concept guide.



Are

 

 

 

To finish reading, please visit source site