Statistics in Plain English for Machine Learning

Last Updated on August 8, 2019 There is an ocean of books on statistics; where do you start? A big problem in choosing a beginner book on statistics is that a book may suffer one of two common problems. It may be a mathematical textbook filled with derivations, special cases, and proofs for each statistical method with little idea for the intuition for the method or how to use it. Or it may be a playbook for a proprietary or […]

Read more

How to Calculate Nonparametric Rank Correlation in Python

Last Updated on August 8, 2019 Correlation is a measure of the association between two variables. It is easy to calculate and interpret when both variables have a well understood Gaussian distribution. When we do not know the distribution of the variables, we must use nonparametric rank correlation methods. In this tutorial, you will discover rank correlation methods for quantifying the association between variables with a non-Gaussian distribution. After completing this tutorial, you will know: How rank correlation methods work […]

Read more

A Gentle Introduction to Effect Size Measures in Python

Last Updated on August 8, 2019 Statistical hypothesis tests report on the likelihood of the observed results given an assumption, such as no association between variables or no difference between groups. Hypothesis tests do not comment on the size of the effect if the association or difference is statistically significant. This highlights the need for standard ways of calculating and reporting a result. Effect size methods refer to a suite of statistical tools from the the field of estimation statistics […]

Read more

A Gentle Introduction to Statistical Power and Power Analysis in Python

Last Updated on April 24, 2020 The statistical power of a hypothesis test is the probability of detecting an effect, if there is a true effect present to detect. Power can be calculated and reported for a completed experiment to comment on the confidence one might have in the conclusions drawn from the results of the study. It can also be used as a tool to estimate the number of observations or sample size required in order to detect an […]

Read more

All of Statistics for Machine Learning

Last Updated on August 8, 2019 A foundation in statistics is required to be effective as a machine learning practitioner. The book “All of Statistics” was written specifically to provide a foundation in probability and statistics for computer science undergraduates that may have an interest in data mining and machine learning. As such, it is often recommended as a book to machine learning practitioners interested in expanding their understanding of statistics. In this post, you will discover the book “All […]

Read more

The Role of Randomization to Address Confounding Variables in Machine Learning

Last Updated on July 31, 2020 A large part of applied machine learning is about running controlled experiments to discover what algorithm or algorithm configuration to use on a predictive modeling problem. A challenge is that there are aspects of the problem and the algorithm called confounding variables that cannot be controlled (held constant) and must be controlled-for. An example is the use of randomness in a learning algorithm, such as random initialization or random choices during learning. The solution […]

Read more

How to Calculate McNemar’s Test to Compare Two Machine Learning Classifiers

Last Updated on August 8, 2019 The choice of a statistical hypothesis test is a challenging open problem for interpreting machine learning results. In his widely cited 1998 paper, Thomas Dietterich recommended the McNemar’s test in those cases where it is expensive or impractical to train multiple copies of classifier models. This describes the current situation with deep learning models that are both very large and are trained and evaluated on large datasets, often requiring days or weeks to train […]

Read more

How to Code the Student’s t-Test from Scratch in Python

Last Updated on August 8, 2019 Perhaps one of the most widely used statistical hypothesis tests is the Student’s t test. Because you may use this test yourself someday, it is important to have a deep understanding of how the test works. As a developer, this understanding is best achieved by implementing the hypothesis test yourself from scratch. In this tutorial, you will discover how to implement the Student’s t-test statistical hypothesis test from scratch in Python. After completing this […]

Read more

Statistics for Machine Learning (7-Day Mini-Course)

Last Updated on August 8, 2019 Statistics for Machine Learning Crash Course. Get on top of the statistics used in machine learning in 7 Days. Statistics is a field of mathematics that is universally agreed to be a prerequisite for a deeper understanding of machine learning. Although statistics is a large field with many esoteric theories and findings, the nuts and bolts tools and notations taken from the field are required for machine learning practitioners. With a solid foundation of […]

Read more

17 Statistical Hypothesis Tests in Python (Cheat Sheet)

Last Updated on November 28, 2019 Quick-reference guide to the 17 statistical hypothesis tests that you need inapplied machine learning, with sample code in Python. Although there are hundreds of statistical hypothesis tests that you could use, there is only a small subset that you may need to use in a machine learning project. In this post, you will discover a cheat sheet for the most popular statistical hypothesis tests for a machine learning project with examples using the Python […]

Read more
1 3 4 5 6