How to One Hot Encode Sequence Data in Python

Last Updated on August 14, 2019

Machine learning algorithms cannot work with categorical data directly.

Categorical data must be converted to numbers.

This applies when you are working with a sequence classification type problem and plan on using deep learning methods such as Long Short-Term Memory recurrent neural networks.

In this tutorial, you will discover how to convert your input or output sequence data to a one hot encoding for use in sequence classification problems with deep learning in Python.

After completing this tutorial, you will know:

  • What an integer encoding and one hot encoding are and why they are necessary in machine learning.
  • How to calculate an integer encoding and one hot encoding by hand in Python.
  • How to use the scikit-learn and Keras libraries to automatically encode your sequence data in Python.

Kick-start your project with my new book Long Short-Term Memory Networks With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.