Transformers-based Encoder-Decoder Models

Patrick von Platen's avatar


Open In Colab

!pip install transformers==4.2.1
!pip install sentencepiece==0.1.95

The transformer-based encoder-decoder model was introduced by Vaswani
et al. in the famous Attention is all you need
paper
and is today the de-facto
standard encoder-decoder architecture in natural language processing
(NLP).

Recently, there has been a lot of research on different pre-training
objectives for transformer-based encoder-decoder models, e.g. T5,
Bart, Pegasus, ProphetNet, Marge, etc…, but the model architecture
has stayed largely the same.

The goal of the blog post is to give an in-detail explanation of
how the transformer-based encoder-decoder architecture models
sequence-to-sequence problems. We will focus on the mathematical model
defined by the architecture and how the model can be used in inference.
Along the way, we will give some background on sequence-to-sequence
models in NLP and break down the transformer-based encoder-decoder
architecture into its encoder and decoder parts. We provide many
illustrations and establish the link between the theory of
transformer-based

 

 

 

To finish reading, please visit source site