How To Prepare Your Data For Machine Learning in Python with Scikit-Learn

Last Updated on December 11, 2019

Many machine learning algorithms make assumptions about your data.

It is often a very good idea to prepare your data in such way to best expose the structure of the problem to the machine learning algorithms that you intend to use.

In this post you will discover how to prepare your data for machine learning in Python using scikit-learn.

Kick-start your project with my new book Machine Learning Mastery With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

  • Update Mar/2018: Added alternate link to download the dataset as the original appears to have been taken down.
How To Prepare Your Data For Machine Learning in Python with Scikit-Learn

How To Prepare Your Data For Machine Learning in Python with Scikit-Learn
Photo by Vinoth Chandar, some rights reserved.

Need For Data Preprocessing

You almost always need to preprocess your data. It is a required step.

A difficulty is that different algorithms make different assumptions about your data and may require different transforms. Further,
To finish reading, please visit source site