Fit More and Train Faster With ZeRO via DeepSpeed and FairScale
A guest blog post by Hugging Face fellow Stas Bekman
As recent Machine Learning models have been growing much faster than the amount of GPU memory added to newly released cards, many users are unable to train or even just load some of those huge models onto their hardware. While there is an ongoing effort to distill some of those huge models