How to Transform Your Machine Learning Data in Weka

Last Updated on December 13, 2019

Often your raw data for machine learning is not in an ideal form for modeling.

You need to prepare or reshape it to meet the expectations of different machine learning algorithms.

In this post you will discover two techniques that you can use to transform your machine learning data ready for modeling.

After reading this post you will know:

  • How to convert a real valued attribute into a discrete distribution called discretization.
  • How to convert a discrete attribute into multiple real values called dummy variables.
  • When to discretize or create dummy variables from your data.

Kick-start your project with my new book Machine Learning Mastery With Weka, including step-by-step tutorials and clear screenshots for all examples.

Let’s get started.

  • Update Mar/2018: Added alternate link to download the dataset as the original appears to have been taken down.

Discretize Numerical Attributes

Some machine learning algorithms prefer or find it easier to work with discrete attributes.

For example, decision tree algorithms can choose split points in real valued attributes, but are much cleaner when split points are chosen between bins or predefined groups in the real-valued attributes.

Discrete attributes are those that describe
To finish reading, please visit source site