A Gentle Introduction to the Chi-Squared Test for Machine Learning

Last Updated on October 31, 2019

A common problem in applied machine learning is determining whether input features are relevant to the outcome to be predicted.

This is the problem of feature selection.

In the case of classification problems where input variables are also categorical, we can use statistical tests to determine whether the output variable is dependent or independent of the input variables. If independent, then the input variable is a candidate for a feature that may be irrelevant to the problem and removed from the dataset.

The Pearson’s chi-squared statistical hypothesis is an example of a test for independence between categorical variables.

In this tutorial, you will discover the chi-squared statistical hypothesis test for quantifying the independence of pairs of categorical variables.

After completing this tutorial, you will know:

  • Pairs of categorical variables can be summarized using a contingency table.
  • The chi-squared test can compare an observed contingency table to an expected table and determine if the categorical variables are independent.
  • How to calculate and interpret the chi-squared test for categorical variables in Python.

Kick-start your project with my new book Statistics for Machine Learning, including step-by-step tutorials and the Python source code files for
To finish reading, please visit source site