Nested Cross-Validation for Machine Learning with Python

Last Updated on August 28, 2020

The k-fold cross-validation procedure is used to estimate the performance of machine learning models when making predictions on data not used during training.

This procedure can be used both when optimizing the hyperparameters of a model on a dataset, and when comparing and selecting a model for the dataset. When the same cross-validation procedure and dataset are used to both tune and select a model, it is likely to lead to an optimistically biased evaluation of the model performance.

One approach to overcoming this bias is to nest the hyperparameter optimization procedure under the model selection procedure. This is called double cross-validation or nested cross-validation and is the preferred way to evaluate and compare tuned machine learning models.

In this tutorial, you will discover nested cross-validation for evaluating tuned machine learning models.

After completing this tutorial, you will know:

  • Hyperparameter optimization can overfit a dataset and provide an optimistic evaluation of a model that should not be used for model selection.
  • Nested cross-validation provides a way to reduce the bias in combined hyperparameter tuning and model selection.
  • How to implement nested cross-validation for evaluating tuned machine learning algorithms in scikit-learn.

Kick-start your
To finish reading, please visit source site