How to Develop Super Learner Ensembles in Python

Last Updated on August 17, 2020

Selecting a machine learning algorithm for a predictive modeling problem involves evaluating many different models and model configurations using k-fold cross-validation.

The super learner is an ensemble machine learning algorithm that combines all of the models and model configurations that you might investigate for a predictive modeling problem and uses them to make a prediction as-good-as or better than any single model that you may have investigated.

The super learner algorithm is an application of stacked generalization, called stacking or blending, to k-fold cross-validation where all models use the same k-fold splits of the data and a meta-model is fit on the out-of-fold predictions from each model.

In this tutorial, you will discover the super learner ensemble machine learning algorithm.

After completing this tutorial, you will know:

  • Super learner is the application of stacked generalization using out-of-fold predictions during k-fold cross-validation.
  • The super learner ensemble algorithm is straightforward to implement in Python using scikit-learn models.
  • The ML-Ensemble (mlens) library provides a convenient implementation that allows the super learner to be fit and used in just a few lines of code.

Let’s get started.