How to Normalize and Standardize Your Machine Learning Data in Weka

Last Updated on December 11, 2019

Machine learning algorithms make assumptions about the dataset you are modeling.

Often, raw data is comprised of attributes with varying scales. For example, one attribute may be in kilograms and another may be a count. Although not required, you can often get a boost in performance by carefully choosing methods to rescale your data.

In this post you will discover how you can rescale your data so that all of the data has the same scale.

After reading this post you will know:

  • How to normalize your numeric attributes between the range of 0 and 1.
  • How to standardize your numeric attributes to have a 0 mean and unit variance.
  • When to choose normalization or standardization.

Kick-start your project with my new book Machine Learning Mastery With Weka, including step-by-step tutorials and clear screenshots for all examples.

Let’s get started.

  • Update March/2018: Added alternate link to download the dataset as the original appears to have been taken down.

Predict the Onset of Diabetes

The dataset used for this example is the Pima Indians onset of diabetes dataset.

It is a classification problem where each instance represents medical details for one patient
To finish reading, please visit source site