How to Handle Big-p, Little-n (p >> n) in Machine Learning

Last Updated on August 19, 2020 What if I have more Columns than Rows in my dataset? Machine learning datasets are often structured or tabular data comprised of rows and columns. The columns that are fed as input to a model are called predictors or “p” and the rows are samples “n“. Most machine learning algorithms assume that there are many more samples than there are predictors, denoted as p > n. These problems often require specialized data preparation and […]

Read more

How to Develop Voting Ensembles With Python

Last Updated on September 7, 2020 Voting is an ensemble machine learning algorithm. For regression, a voting ensemble involves making a prediction that is the average of multiple other regression models. In classification, a hard voting ensemble involves summing the votes for crisp class labels from other models and predicting the class with the most votes. A soft voting ensemble involves summing the predicted probabilities for class labels and predicting the class label with the largest sum probability. In this […]

Read more

How to Develop a Random Forest Ensemble in Python

Last Updated on September 7, 2020 Random forest is an ensemble machine learning algorithm. It is perhaps the most popular and widely used machine learning algorithm given its good or excellent performance across a wide range of classification and regression predictive modeling problems. It is also easy to use given that it has few key hyperparameters and sensible heuristics for configuring these hyperparameters. In this tutorial, you will discover how to develop a random forest ensemble for classification and regression. […]

Read more

How to Develop an Extra Trees Ensemble with Python

Last Updated on August 17, 2020 Extra Trees is an ensemble machine learning algorithm that combines the predictions from many decision trees. It is related to the widely used random forest algorithm. It can often achieve as-good or better performance than the random forest algorithm, although it uses a simpler algorithm to construct the decision trees used as members of the ensemble. It is also easy to use given that it has few key hyperparameters and sensible heuristics for configuring […]

Read more

A Gentle Introduction to Degrees of Freedom in Machine Learning

Last Updated on August 19, 2020 Degrees of freedom is an important concept from statistics and engineering. It is often employed to summarize the number of values used in the calculation of a statistic, such as a sample statistic or in a statistical hypothesis test. In machine learning, the degrees of freedom may refer to the number of parameters in the model, such as the number of coefficients in a linear regression model or the number of weights in a […]

Read more

How to Develop a Bagging Ensemble with Python

Last Updated on September 7, 2020 Bagging is an ensemble machine learning algorithm that combines the predictions from many decision trees. It is also easy to implement given that it has few key hyperparameters and sensible heuristics for configuring these hyperparameters. Bagging performs well in general and provides the basis for a whole field of ensemble of decision tree algorithms such as the popular random forest and extra trees ensemble algorithms, as well as the lesser-known Pasting, Random Subspaces, and […]

Read more

Difference Between Algorithm and Model in Machine Learning

Last Updated on August 19, 2020 Machine learning involves the use of machine learning algorithms and models. For beginners, this is very confusing as often “machine learning algorithm” is used interchangeably with “machine learning model.” Are they the same thing or something different? As a developer, your intuition with “algorithms” like sort algorithms and search algorithms will help to clear up this confusion. In this post, you will discover the difference between machine learning “algorithms” and “models.” After reading this […]

Read more

How to Develop an AdaBoost Ensemble in Python

Last Updated on August 13, 2020 Boosting is a class of ensemble machine learning algorithms that involve combining the predictions from many weak learners. A weak learner is a model that is very simple, although has some skill on the dataset. Boosting was a theoretical concept long before a practical algorithm could be developed, and the AdaBoost (adaptive boosting) algorithm was the first successful approach for the idea. The AdaBoost algorithm involves using very short (one-level) decision trees as weak […]

Read more

How to Develop a Gradient Boosting Machine Ensemble in Python

Last Updated on September 7, 2020 The Gradient Boosting Machine is a powerful ensemble machine learning algorithm that uses decision trees. Boosting is a general ensemble technique that involves sequentially adding models to the ensemble where subsequent models correct the performance of prior models. AdaBoost was the first algorithm to deliver on the promise of boosting. Gradient boosting is a generalization of AdaBoosting, improving the performance of the approach and introducing ideas from bootstrap aggregation to further improve the models, […]

Read more

Introduction to Dimensionality Reduction for Machine Learning

Last Updated on June 30, 2020 The number of input variables or features for a dataset is referred to as its dimensionality. Dimensionality reduction refers to techniques that reduce the number of input variables in a dataset. More input features often make a predictive modeling task more challenging to model, more generally referred to as the curse of dimensionality. High-dimensionality statistics and dimensionality reduction techniques are often used for data visualization. Nevertheless these techniques can be used in applied machine […]

Read more
1 802 803 804 805 806 861