Train-Test Split for Evaluating Machine Learning Algorithms

Last Updated on August 26, 2020 The train-test split procedure is used to estimate the performance of machine learning algorithms when they are used to make predictions on data not used to train the model. It is a fast and easy procedure to perform, the results of which allow you to compare the performance of machine learning algorithms for your predictive modeling problem. Although simple to use and interpret, there are times when the procedure should not be used, such […]

Read more

LOOCV for Evaluating Machine Learning Algorithms

Last Updated on August 26, 2020 The Leave-One-Out Cross-Validation, or LOOCV, procedure is used to estimate the performance of machine learning algorithms when they are used to make predictions on data not used to train the model. It is a computationally expensive procedure to perform, although it results in a reliable and unbiased estimate of model performance. Although simple to use and no configuration to specify, there are times when the procedure should not be used, such as when you […]

Read more

Nested Cross-Validation for Machine Learning with Python

Last Updated on August 28, 2020 The k-fold cross-validation procedure is used to estimate the performance of machine learning models when making predictions on data not used during training. This procedure can be used both when optimizing the hyperparameters of a model on a dataset, and when comparing and selecting a model for the dataset. When the same cross-validation procedure and dataset are used to both tune and select a model, it is likely to lead to an optimistically biased […]

Read more

How to Configure k-Fold Cross-Validation

Last Updated on August 26, 2020 The k-fold cross-validation procedure is a standard method for estimating the performance of a machine learning algorithm on a dataset. A common value for k is 10, although how do we know that this configuration is appropriate for our dataset and our algorithms? One approach is to explore the effect of different k values on the estimate of model performance and compare this to an ideal test condition. This can help to choose an […]

Read more

Repeated k-Fold Cross-Validation for Model Evaluation in Python

Last Updated on August 26, 2020 The k-fold cross-validation procedure is a standard method for estimating the performance of a machine learning algorithm or configuration on a dataset. A single run of the k-fold cross-validation procedure may result in a noisy estimate of model performance. Different splits of the data may result in very different results. Repeated k-fold cross-validation provides a way to improve the estimated performance of a machine learning model. This involves simply repeating the cross-validation procedure multiple […]

Read more

How to use Seaborn Data Visualization for Machine Learning

Last Updated on August 19, 2020 Data visualization provides insight into the distribution and relationships between variables in a dataset. This insight can be helpful in selecting data preparation techniques to apply prior to modeling and the types of algorithms that may be most suited to the data. Seaborn is a data visualization library for Python that runs on top of the popular Matplotlib data visualization library, although it provides a simple interface and aesthetically better-looking plots. In this tutorial, […]

Read more

Plot a Decision Surface for Machine Learning Algorithms in Python

Last Updated on August 26, 2020 Classification algorithms learn how to assign class labels to examples, although their decisions can appear opaque. A popular diagnostic for understanding the decisions made by a classification algorithm is the decision surface. This is a plot that shows how a fit machine learning algorithm predicts a coarse grid across the input feature space. A decision surface plot is a powerful tool for understanding how a given model “sees” the prediction task and how it […]

Read more

How to Calculate the Bias-Variance Trade-off with Python

Last Updated on August 26, 2020 The performance of a machine learning model can be characterized in terms of the bias and the variance of the model. A model with high bias makes strong assumptions about the form of the unknown underlying function that maps inputs to outputs in the dataset, such as linear regression. A model with high variance is highly dependent upon the specifics of the training dataset, such as unpruned decision trees. We desire models with low […]

Read more

Scikit-Optimize for Hyperparameter Tuning in Machine Learning

Last Updated on September 7, 2020 Hyperparameter optimization refers to performing a search in order to discover the set of specific model configuration arguments that result in the best performance of the model on a specific dataset. There are many ways to perform hyperparameter optimization, although modern methods, such as Bayesian Optimization, are fast and effective. The Scikit-Optimize library is an open-source Python library that provides an implementation of Bayesian Optimization that can be used to tune the hyperparameters of […]

Read more
1 6 7 8