Prediction Intervals for Machine Learning

Last Updated on May 1, 2020 A prediction from a machine learning perspective is a single point that hides the uncertainty of that prediction. Prediction intervals provide a way to quantify and communicate the uncertainty in a prediction. They are different from confidence intervals that instead seek to quantify the uncertainty in a population parameter such as a mean or standard deviation. Prediction intervals describe the uncertainty for a single specific outcome. In this tutorial, you will discover the prediction […]

Read more

A Gentle Introduction to Statistical Tolerance Intervals in Machine Learning

Last Updated on August 8, 2019 It can be useful to have an upper and lower limit on data. These bounds can be used to help identify anomalies and set expectations for what to expect. A bound on observations from a population is called a tolerance interval. A tolerance interval comes from the field of estimation statistics. A tolerance interval is different from a prediction interval that quantifies the uncertainty for a single predicted value. It is also different from […]

Read more

A Gentle Introduction to Estimation Statistics for Machine Learning

Last Updated on August 8, 2019 Statistical hypothesis tests can be used to indicate whether the difference between two samples is due to random chance, but cannot comment on the size of the difference. A group of methods referred to as “new statistics” are seeing increased use instead of or in addition to p-values in order to quantify the magnitude of effects and the amount of uncertainty for estimated values. This group of statistical methods is referred to as “estimation […]

Read more

A Gentle Introduction to Data Visualization Methods in Python

Last Updated on August 23, 2019 Sometimes data does not make sense until you can look at in a visual form, such as with charts and plots. Being able to quickly visualize your data samples for yourself and others is an important skill both in applied statistics and in applied machine learning. In this tutorial, you will discover the five types of plots that you will need to know when visualizing data in Python and how to use them to […]

Read more

A Gentle Introduction to Statistical Data Distributions

Last Updated on August 8, 2019 A sample of data will form a distribution, and by far the most well-known distribution is the Gaussian distribution, often called the Normal distribution. The distribution provides a parameterized mathematical function that can be used to calculate the probability for any individual observation from the sample space. This distribution describes the grouping or the density of the observations, called the probability density function. We can also calculate the likelihood of an observation having a […]

Read more

How to Calculate Critical Values for Statistical Hypothesis Testing with Python

Last Updated on September 24, 2019 In is common, if not standard, to interpret the results of statistical hypothesis tests using a p-value. Not all implementations of statistical tests return p-values. In some cases, you must use alternatives, such as critical values. In addition, critical values are used when estimating the expected intervals for observations from a population, such as in tolerance intervals. In this tutorial, you will discover critical values, why they are important, how they are used, and […]

Read more

A Gentle Introduction to Statistical Sampling and Resampling

Last Updated on August 8, 2019 Data is the currency of applied machine learning. Therefore, it is important that it is both collected and used effectively. Data sampling refers to statistical methods for selecting observations from the domain with the objective of estimating a population parameter. Whereas data resampling refers to methods for economically using a collected dataset to improve the estimate of the population parameter and help to quantify the uncertainty of the estimate. Both data sampling and data […]

Read more

How to Calculate the 5-Number Summary for Your Data in Python

Last Updated on August 8, 2019 Data summarization provides a convenient way to describe all of the values in a data sample with just a few statistical values. The mean and standard deviation are used to summarize data with a Gaussian distribution, but may not be meaningful, or could even be misleading, if your data sample has a non-Gaussian distribution. In this tutorial, you will discover the five-number summary for describing the distribution of a data sample without assuming a […]

Read more

A Gentle Introduction to the Chi-Squared Test for Machine Learning

Last Updated on October 31, 2019 A common problem in applied machine learning is determining whether input features are relevant to the outcome to be predicted. This is the problem of feature selection. In the case of classification problems where input variables are also categorical, we can use statistical tests to determine whether the output variable is dependent or independent of the input variables. If independent, then the input variable is a candidate for a feature that may be irrelevant […]

Read more

Statistical Significance Tests for Comparing Machine Learning Algorithms

Last Updated on August 8, 2019 Comparing machine learning methods and selecting a final model is a common operation in applied machine learning. Models are commonly evaluated using resampling methods like k-fold cross-validation from which mean skill scores are calculated and compared directly. Although simple, this approach can be misleading as it is hard to know whether the difference between mean skill scores is real or the result of a statistical fluke. Statistical significance tests are designed to address this […]

Read more
1 769 770 771 772 773 859