# Do Not Use Random Guessing As Your Baseline Classifier

Last Updated on September 25, 2019 I recently received the following question via email: Hi Jason, quick question. A case of class imbalance: 90 cases of thumbs up 10 cases of thumbs down. How would we calculate random guessing accuracy in this case? We can answer this question using some basic probability (I opened excel and typed in some numbers). Kick-start your project with my new book Probability for Machine Learning, including step-by-step tutorials and the Python source code files for […]

# How to Use ROC Curves and Precision-Recall Curves for Classification in Python

Last Updated on August 22, 2020 It can be more flexible to predict probabilities of an observation belonging to each class in a classification problem rather than predicting classes directly. This flexibility comes from the way that probabilities may be interpreted using different thresholds that allow the operator of the model to trade-off concerns in the errors made by the model, such as the number of false positives compared to the number of false negatives. This is required when using […]

# How and When to Use a Calibrated Classification Model with scikit-learn

Last Updated on September 25, 2019 Instead of predicting class values directly for a classification problem, it can be convenient to predict the probability of an observation belonging to each possible class. Predicting probabilities allows some flexibility including deciding how to interpret the probabilities, presenting predictions with uncertainty, and providing more nuanced ways to evaluate the skill of the model. Predicted probabilities that match the expected distribution of probabilities for each class are referred to as calibrated. The problem is, […]

# A Gentle Introduction to Probability Scoring Methods in Python

Last Updated on December 31, 2019 How to Score Probability Predictions in Python andDevelop an Intuition for Different Metrics. Predicting probabilities instead of class labels for a classification problem can provide additional nuance and uncertainty for the predictions. The added nuance allows more sophisticated metrics to be used to interpret and evaluate the predicted probabilities. In general, methods for the evaluation of the accuracy of predicted probabilities are referred to as scoring rules or scoring functions. In this tutorial, you […]

# A Gentle Introduction to Jensen’s Inequality

Last Updated on July 31, 2020 It is common in statistics and machine learning to create a linear transform or mapping of a variable. An example is a linear scaling of a feature variable. We have the natural intuition that the mean of the scaled values is the same as the scaled value of the mean raw variable values. This makes sense. Unfortunately, we bring this intuition with us when using nonlinear transformations of variables where this relationship no longer […]

# How to Develop and Evaluate Naive Classifier Strategies Using Probability

Last Updated on September 25, 2019 A Naive Classifier is a simple classification model that assumes little to nothing about the problem and the performance of which provides a baseline by which all other models evaluated on a dataset can be compared. There are different strategies that can be used for a naive classifier, and some are better than others, depending on the dataset and the choice of performance measures. The most common performance measure is classification accuracy and common […]

# Resources for Getting Started With Probability in Machine Learning

Last Updated on September 25, 2019 Machine Learning is a field of computer science concerned with developing systems that can learn from data. Like statistics and linear algebra, probability is another foundational field that supports machine learning. Probability is a field of mathematics concerned with quantifying uncertainty. Many aspects of machine learning are uncertain, including, most critically, observations from the problem domain and the relationships learned by models from that data. As such, some understanding of probability and tools and […]

# 5 Reasons to Learn Probability for Machine Learning

Last Updated on November 8, 2019 Probability is a field of mathematics that quantifies uncertainty. It is undeniably a pillar of the field of machine learning, and many recommend it as a prerequisite subject to study prior to getting started. This is misleading advice, as probability makes more sense to a practitioner once they have the context of the applied machine learning process in which to interpret it. In this post, you will discover why machine learning practitioners should study […]