Information Gain and Mutual Information for Machine Learning

Last Updated on August 28, 2020

Information gain calculates the reduction in entropy or surprise from transforming a dataset in some way.

It is commonly used in the construction of decision trees from a training dataset, by evaluating the information gain for each variable, and selecting the variable that maximizes the information gain, which in turn minimizes the entropy and best splits the dataset into groups for effective classification.

Information gain can also be used for feature selection, by evaluating the gain of each variable in the context of the target variable. In this slightly different usage, the calculation is referred to as mutual information between the two random variables.

In this post, you will discover information gain and mutual information in machine learning.

After reading this post, you will know:

Information gain is the reduction in entropy or surprise by transforming a dataset and is often used in training decision trees.
Information gain is calculated by comparing the entropy of the dataset before and after a transformation.
Mutual information calculates the statistical dependence between two variables and is the name given to information gain when applied to variable selection.

Kick-start your project with my new book To finish reading, please visit source site

Probability