Joining the Transformer Encoder and Decoder Plus Masking

We have arrived at a point where we have implemented and tested the Transformer encoder and decoder separately, and we may now join the two together into a complete model. We will also see how to create padding and look-ahead masks by which we will suppress the input values that will not be considered in the encoder or decoder computations. Our end goal remains to apply the complete model to Natural Language Processing (NLP).

In this tutorial, you will discover how to implement the complete Transformer model and create padding and look-ahead masks. 

After completing this tutorial, you will know: