A Gentle Introduction to k-fold Cross-Validation

Last Updated on August 3, 2020

Cross-validation is a statistical method used to estimate the skill of machine learning models.

It is commonly used in applied machine learning to compare and select a model for a given predictive modeling problem because it is easy to understand, easy to implement, and results in skill estimates that generally have a lower bias than other methods.

In this tutorial, you will discover a gentle introduction to the k-fold cross-validation procedure for estimating the skill of machine learning models.

After completing this tutorial, you will know:

  • That k-fold cross validation is a procedure used to estimate the skill of the model on new data.
  • There are common tactics that you can use to select the value of k for your dataset.
  • There are commonly used variations on cross-validation such as stratified and repeated that are available in scikit-learn.

Kick-start your project with my new book Statistics for Machine Learning, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

  • Updated Jul/2020: Added links to related types of cross-validation.
A Gentle Introduction to k-fold Cross-ValidationTo finish reading, please visit source site