Fit More and Train Faster With ZeRO via DeepSpeed and FairScale

Stas Bekman's avatar

A guest blog post by Hugging Face fellow Stas Bekman

As recent Machine Learning models have been growing much faster than the amount of GPU memory added to newly released cards, many users are unable to train or even just load some of those huge models onto their hardware. While there is an ongoing effort to distill some of those huge models

 

 

 

To finish reading, please visit source site