The Reformer – Pushing the limits of language modeling

How the Reformer uses less than 8GB of RAM to train on sequences of half a million tokens

The Reformer model as introduced by Kitaev, Kaiser et al. (2020) is one of the most memory-efficient transformer models for long sequence modeling as of today.

Recently, long sequence modeling has experienced a surge of interest as can be seen by the many submissions from this year alone – Beltagy et al. (2020), Roy et al. (2020), Tay et al., Wang et al. to name a few.
The motivation behind long sequence modeling is that many tasks in NLP, e.g. summarization, question answering, require the model to process longer input sequences than models, such as BERT, are able to handle. In tasks that require the model

To finish reading, please visit source site