Don’t Sweep your Learning Rate under the Rug- A Closer Look at Cross-modal Transfer of Pretrained Transformers
July 23, 2021 By: Danielle Rothermel, Margaret Li, Tim Rocktäschel, Jakob Foerster Abstract Self-supervised pre-training of large-scale transformer models on text corpora followed by fine-tuning has achieved state-of-the-art on a number of natural language processing tasks. Recently, Lu et al. (2021) claimed that frozen pretrained transformers (FPTs) match or outperform training from scratch as well as unfrozen (fine-tuned) pretrained transformers in a set of transfer tasks to other modalities. In our work, we find that this result is, in fact, […]
Read more