How to Fix k-Fold Cross-Validation for Imbalanced Classification

Last Updated on July 31, 2020

Model evaluation involves using the available dataset to fit a model and estimate its performance when making predictions on unseen examples.

It is a challenging problem as both the training dataset used to fit the model and the test set used to evaluate it must be sufficiently large and representative of the underlying problem so that the resulting estimate of model performance is not too optimistic or pessimistic.

The two most common approaches used for model evaluation are the train/test split and the k-fold cross-validation procedure. Both approaches can be very effective in general, although they can result in misleading results and potentially fail when used on classification problems with a severe class imbalance. Instead, the techniques must be modified to stratify the sampling by the class label, called stratified train-test split or stratified k-fold cross-validation.

In this tutorial, you will discover how to evaluate classifier models on imbalanced datasets.

After completing this tutorial, you will know:

  • The challenge of evaluating classifiers on datasets using train/test splits and cross-validation.
  • How a naive application of k-fold cross-validation and train-test splits will fail when evaluating classifiers on imbalanced datasets.
  • How modified k-fold cross-validation and
    To finish reading, please visit source site