How to Prepare a Photo Caption Dataset for Training a Deep Learning Model

Last Updated on August 7, 2019

Automatic photo captioning is a problem where a model must generate a human-readable textual description given a photograph.

It is a challenging problem in artificial intelligence that requires both image understanding from the field of computer vision as well as language generation from the field of natural language processing.

It is now possible to develop your own image caption models using deep learning and freely available datasets of photos and their descriptions.

In this tutorial, you will discover how to prepare photos and textual descriptions ready for developing a deep learning automatic photo caption generation model.

After completing this tutorial, you will know:

  • About the Flickr8K dataset comprised of more than 8,000 photos and up to 5 captions for each photo.
  • How to generally load and prepare photo and text data for modeling with deep learning.
  • How to specifically encode data for two different types of deep learning models in Keras.

Kick-start your project with my new book Deep Learning for Natural Language Processing, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.