Sensitivity Analysis of Dataset Size vs. Model Performance

Machine learning model performance often improves with dataset size for predictive modeling. This depends on the specific datasets and on the choice of model, although it often means that using more data can result in better performance and that discoveries made using smaller datasets to estimate model performance often scale to using larger datasets. The problem is the relationship is unknown for a given dataset and model, and may not exist for some datasets and models. Additionally, if such a […]

Read more

Regression Metrics for Machine Learning

Regression refers to predictive modeling problems that involve predicting a numeric value. It is different from classification that involves predicting a class label. Unlike classification, you cannot use classification accuracy to evaluate the predictions made by a regression model. Instead, you must use error metrics specifically designed for evaluating predictions made on regression problems. In this tutorial, you will discover how to calculate error metrics for regression predictive modeling projects. After completing this tutorial, you will know: Regression predictive modeling […]

Read more

A Gentle Introduction to Machine Learning Modeling Pipelines

Applied machine learning is typically focused on finding a single model that performs well or best on a given dataset. Effective use of the model will require appropriate preparation of the input data and hyperparameter tuning of the model. Collectively, the linear sequence of steps required to prepare the data, tune the model, and transform the predictions is called the modeling pipeline. Modern machine learning libraries like the scikit-learn Python library allow this sequence of steps to be defined and […]

Read more

Semi-Supervised Learning With Label Spreading

Semi-supervised learning refers to algorithms that attempt to make use of both labeled and unlabeled training data. Semi-supervised learning algorithms are unlike supervised learning algorithms that are only able to learn from labeled training data. A popular approach to semi-supervised learning is to create a graph that connects examples in the training dataset and propagates known labels through the edges of the graph to label unlabeled examples. An example of this approach to semi-supervised learning is the label spreading algorithm […]

Read more

Semi-Supervised Learning With Label Propagation

Semi-supervised learning refers to algorithms that attempt to make use of both labeled and unlabeled training data. Semi-supervised learning algorithms are unlike supervised learning algorithms that are only able to learn from labeled training data. A popular approach to semi-supervised learning is to create a graph that connects examples in the training dataset and propagate known labels through the edges of the graph to label unlabeled examples. An example of this approach to semi-supervised learning is the label propagation algorithm […]

Read more

Multinomial Logistic Regression With Python

Multinomial logistic regression is an extension of logistic regression that adds native support for multi-class classification problems. Logistic regression, by default, is limited to two-class classification problems. Some extensions like one-vs-rest can allow logistic regression to be used for multi-class classification problems, although they require that the classification problem first be transformed into multiple binary classification problems. Instead, the multinomial logistic regression algorithm is an extension to the logistic regression model that involves changing the loss function to cross-entropy loss […]

Read more

How to Identify Overfitting Machine Learning Models in Scikit-Learn

Last Updated on November 27, 2020 Overfitting is a common explanation for the poor performance of a predictive model. An analysis of learning dynamics can help to identify whether a model has overfit the training dataset and may suggest an alternate configuration to use that could result in better predictive performance. Performing an analysis of learning dynamics is straightforward for algorithms that learn incrementally, like neural networks, but it is less clear how we might perform the same analysis with […]

Read more

A Gentle Introduction to PyCaret for Machine Learning

PyCaret is a Python open source machine learning library designed to make performing standard tasks in a machine learning project easy. It is a Python version of the Caret machine learning package in R, popular because it allows models to be evaluated, compared, and tuned on a given dataset with just a few lines of code. The PyCaret library provides these features, allowing the machine learning practitioner in Python to spot check a suite of standard machine learning algorithms on […]

Read more

Perceptron Algorithm for Classification in Python

The Perceptron is a linear machine learning algorithm for binary classification tasks. It may be considered one of the first and one of the simplest types of artificial neural networks. It is definitely not “deep” learning but is an important building block. Like logistic regression, it can quickly learn a linear separation in feature space for two-class classification tasks, although unlike logistic regression, it learns using the stochastic gradient descent optimization algorithm and does not predict calibrated probabilities. In this […]

Read more

How to Develop LARS Regression Models in Python

Regression is a modeling task that involves predicting a numeric value given an input. Linear regression is the standard algorithm for regression that assumes a linear relationship between inputs and the target variable. An extension to linear regression involves adding penalties to the loss function during training that encourage simpler models that have smaller coefficient values. These extensions are referred to as regularized linear regression or penalized linear regression. Lasso Regression is a popular type of regularized linear regression that […]

Read more
1 2 3 8