An Efficient Model Parallelization Toolkit for Deployment
parallelformers An Efficient Model Parallelization Toolkit for Deployment. Parallelformers, which is based on Megatron LM, is designed to make model parallelization easier. You can parallelize various models in HuggingFace Transformers on multiple GPUs with a single line of code. Currently, Parallelformers only supports inference. Training features are NOT included. Why Parallelformers? You can load a model that is too large for a single GPU. For example, using Parallelformers, you can load a model of 12GB on two 8 GB GPUs. […]
Read more 
			 
			 
			 
			 
			 
			 
			 
			 
			