How to Calculate McNemar’s Test to Compare Two Machine Learning Classifiers

Last Updated on August 8, 2019

The choice of a statistical hypothesis test is a challenging open problem for interpreting machine learning results.

In his widely cited 1998 paper, Thomas Dietterich recommended the McNemar’s test in those cases where it is expensive or impractical to train multiple copies of classifier models.

This describes the current situation with deep learning models that are both very large and are trained and evaluated on large datasets, often requiring days or weeks to train a single model.

In this tutorial, you will discover how to use the McNemar’s statistical hypothesis test to compare machine learning classifier models on a single test dataset.

After completing this tutorial, you will know:

  • The recommendation of the McNemar’s test for models that are expensive to train, which suits large deep learning models.
  • How to transform prediction results from two classifiers into a contingency table and how the table is used to calculate the statistic in the McNemar’s test.
  • How to calculate the McNemar’s test in Python and interpret and report the result.

Kick-start your project with my new book Statistics for Machine Learning, including step-by-step tutorials and the Python source code files for all
To finish reading, please visit source site