Training large models across multiple GPUs can be challenging due to the complexities of different parallelism strategies. In Accelerate, together with Axolotl, we have integrated a quick and easy way to use any combination of parallelism strategies in your training script! Here is how to add it to your training script: from transformers import AutoModelForCausalLM from accelerate import Accelerator from accelerate.parallelism_config import ParallelismConfig from accelerate.utils import FullyShardedDataParallelPlugin pc = ParallelismConfig( dp_shard_size=2, dp_replicate_size=2, cp_size=2, tp_size=2, ) fsdp_plugin = FullyShardedDataParallelPlugin( fsdp_version=2, auto_wrap_policy=”transformer_based_wrap”, […]
Read more