The Transformer Model

We have already familiarized ourselves with the concept of self-attention as implemented by the Transformer attention mechanism for neural machine translation. We will now be shifting our focus to the details of the Transformer architecture itself to discover how self-attention can be implemented without relying on the use of recurrence and convolutions.

In this tutorial, you will discover the network architecture of the Transformer model.

After completing this tutorial, you will know:

How the Transformer architecture implements an encoder-decoder structure without recurrence and convolutions
How the Transformer encoder and decoder work
How the Transformer self-attention compares to the use of recurrent and convolutional layers

Kick-start your project with my book Building Transformer

To finish reading, please visit source site

Attention