Develop an Intuition for How Ensemble Learning Works

Ensembles are a machine learning method that combine the predictions from multiple models in an effort to achieve better predictive performance. There are many different types of ensembles, although all approaches have two key properties: they require that the contributing models are different so that they make different errors and they combine the predictions in an attempt to harness what each different model does well. Nevertheless, it is not clear how ensembles manage to achieve this, especially in the context […]

Read more

How to Identify Overfitting Machine Learning Models in Scikit-Learn

Last Updated on November 27, 2020 Overfitting is a common explanation for the poor performance of a predictive model. An analysis of learning dynamics can help to identify whether a model has overfit the training dataset and may suggest an alternate configuration to use that could result in better predictive performance. Performing an analysis of learning dynamics is straightforward for algorithms that learn incrementally, like neural networks, but it is less clear how we might perform the same analysis with […]

Read more

Multivariate Adaptive Regression Splines (MARS) in Python

Multivariate Adaptive Regression Splines, or MARS, is an algorithm for complex non-linear regression problems. The algorithm involves finding a set of simple linear functions that in aggregate result in the best predictive performance. In this way, MARS is a type of ensemble of simple linear functions and can achieve good performance on challenging regression problems with many input variables and complex non-linear relationships. In this tutorial, you will discover how to develop Multivariate Adaptive Regression Spline models in Python. After […]

Read more

Develop a Bagging Ensemble with Different Data Transformations

Bootstrap aggregation, or bagging, is an ensemble where each model is trained on a different sample of the training dataset. The idea of bagging can be generalized to other techniques for changing the training dataset and fitting the same model on each changed version of the data. One approach is to use data transforms that change the scale and probability distribution of input variables as the basis for the training of contributing members to a bagging-like ensemble. We can refer […]

Read more

How to Develop a Feature Selection Subspace Ensemble in Python

Random subspace ensembles consist of the same model fit on different randomly selected groups of input features (columns) in the training dataset. There are many ways to choose groups of features in the training dataset, and feature selection is a popular class of data preparation techniques designed specifically for this purpose. The features selected by different configurations of the same feature selection method and different feature selection methods entirely can be used as the basis for ensemble learning. In this […]

Read more

A Gentle Introduction to PyCaret for Machine Learning

PyCaret is a Python open source machine learning library designed to make performing standard tasks in a machine learning project easy. It is a Python version of the Caret machine learning package in R, popular because it allows models to be evaluated, compared, and tuned on a given dataset with just a few lines of code. The PyCaret library provides these features, allowing the machine learning practitioner in Python to spot check a suite of standard machine learning algorithms on […]

Read more

Extreme Gradient Boosting (XGBoost) Ensemble in Python

Extreme Gradient Boosting (XGBoost) is an open-source library that provides an efficient and effective implementation of the gradient boosting algorithm. Although other open-source implementations of the approach existed before XGBoost, the release of XGBoost appeared to unleash the power of the technique and made the applied machine learning community take notice of gradient boosting more generally. Shortly after its development and initial release, XGBoost became the go-to method and often the key component in winning solutions for classification and regression […]

Read more

How to Develop a Light Gradient Boosted Machine (LightGBM) Ensemble

Light Gradient Boosted Machine, or LightGBM for short, is an open-source library that provides an efficient and effective implementation of the gradient boosting algorithm. LightGBM extends the gradient boosting algorithm by adding a type of automatic feature selection as well as focusing on boosting examples with larger gradients. This can result in a dramatic speedup of training and improved predictive performance. As such, LightGBM has become a de facto algorithm for machine learning competitions when working with tabular data for […]

Read more

How to Develop Random Forest Ensembles With XGBoost

The XGBoost library provides an efficient implementation of gradient boosting that can be configured to train random forest ensembles. Random forest is a simpler algorithm than gradient boosting. The XGBoost library allows the models to be trained in a way that repurposes and harnesses the computational efficiencies implemented in the library for training random forest models. In this tutorial, you will discover how to use the XGBoost library to develop random forest ensembles. After completing this tutorial, you will know: […]

Read more

Blending Ensemble Machine Learning With Python

Blending is an ensemble machine learning algorithm. It is a colloquial name for stacked generalization or stacking ensemble where instead of fitting the meta-model on out-of-fold predictions made by the base model, it is fit on predictions made on a holdout dataset. Blending was used to describe stacking models that combined many hundreds of predictive models by competitors in the $1M Netflix machine learning competition, and as such, remains a popular technique and name for stacking in competitive machine learning […]

Read more
1 5 6 7 8 9 19