Best Resources for Imbalanced Classification

Last Updated on January 14, 2020

Classification is a predictive modeling problem that involves predicting a class label for a given example.

It is generally assumed that the distribution of examples in the training dataset is even across all of the classes. In practice, this is rarely the case.

Those classification predictive models where the distribution of examples across class labels is not equal (e.g. are skewed) are called “imbalanced classification.”

Typically, a slight imbalance is not a problem and standard machine learning techniques can be used. In those cases where the imbalance is severe, such as a 1:100, 1:1000, or higher ratio of the minority to the majority class, then specialized techniques are required.

The reason why specialized techniques are required for classification problems with a severe imbalance in the classes is that most machine learning models used for classification were designed and tested around the assumption that the class distribution is equal. As such, they often fail or result in misleading results.

In this tutorial, you will discover the best resources that you can use to get started with imbalanced classification.

After completing this tutorial, you will know: