How to Configure an Encoder-Decoder Model for Neural Machine Translation

Last Updated on August 7, 2019

The encoder-decoder architecture for recurrent neural networks is achieving state-of-the-art results on standard machine translation benchmarks and is being used in the heart of industrial translation services.

The model is simple, but given the large amount of data required to train it, tuning the myriad of design decisions in the model in order get top performance on your problem can be practically intractable. Thankfully, research scientists have used Google-scale hardware to do this work for us and provide a set of heuristics for how to configure the encoder-decoder model for neural machine translation and for sequence prediction generally.

In this post, you will discover the details of how to best configure an encoder-decoder recurrent neural network for neural machine translation and other natural language processing tasks.

After reading this post, you will know:

  • The Google study that investigated each model design decision in the encoder-decoder model to isolate their effects.
  • The results and recommendations for design decisions like word embeddings, encoder and decoder depth, and attention mechanisms.
  • A set of base model design decisions that can be used as a starting point on your own sequence-to-sequence projects.

Kick-start your project with
To finish reading, please visit source site