Why One-Hot Encode Data in Machine Learning?

Last Updated on June 30, 2020

Getting started in applied machine learning can be difficult, especially when working with real-world data.

Often, machine learning tutorials will recommend or require that you prepare your data in specific ways before fitting a machine learning model.

One good example is to use a one-hot encoding on categorical data.

Why is a one-hot encoding required?
Why can’t you fit a model on your data directly?

In this post, you will discover the answer to these important questions and better understand data preparation in general in applied machine learning.

Kick-start your project with my new book Data Preparation for Machine Learning, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

Why One-Hot Encode Data in Machine Learning?
Photo by Karan Jain, some rights reserved.

What is Categorical Data?

Categorical data are variables that contain label values rather than numeric values.

The number of possible values is often limited to a fixed set.

Categorical variables are often called To finish reading, please visit source site

Data Preparation