Caption Generation with the Inject and Merge Encoder-Decoder Models

Last Updated on August 7, 2019

Caption generation is a challenging artificial intelligence problem that draws on both computer vision and natural language processing.

The encoder-decoder recurrent neural network architecture has been shown to be effective at this problem. The implementation of this architecture can be distilled into inject and merge based models, and both make different assumptions about the role of the recurrent neural network in addressing the problem.

In this post, you will discover the inject and merge architectures for the encoder-decoder recurrent neural network models on caption generation.

After reading this post, you will know:

  • The challenge of caption generation and the use of the encoder-decoder architecture.
  • The inject model that combines the encoded image with each word to generate the next word in the caption.
  • The merge model that separately encodes the image and description which are decoded in order to generate the next word in the caption.

Kick-start your project with my new book Deep Learning for Natural Language Processing, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

Caption Generation with the Inject and Merge
<a href=To finish reading, please visit source site