A Gentle Introduction to Statistical Sampling and Resampling

Last Updated on August 8, 2019 Data is the currency of applied machine learning. Therefore, it is important that it is both collected and used effectively. Data sampling refers to statistical methods for selecting observations from the domain with the objective of estimating a population parameter. Whereas data resampling refers to methods for economically using a collected dataset to improve the estimate of the population parameter and help to quantify the uncertainty of the estimate. Both data sampling and data […]

Read more

How to Calculate the 5-Number Summary for Your Data in Python

Last Updated on August 8, 2019 Data summarization provides a convenient way to describe all of the values in a data sample with just a few statistical values. The mean and standard deviation are used to summarize data with a Gaussian distribution, but may not be meaningful, or could even be misleading, if your data sample has a non-Gaussian distribution. In this tutorial, you will discover the five-number summary for describing the distribution of a data sample without assuming a […]

Read more

A Gentle Introduction to the Chi-Squared Test for Machine Learning

Last Updated on October 31, 2019 A common problem in applied machine learning is determining whether input features are relevant to the outcome to be predicted. This is the problem of feature selection. In the case of classification problems where input variables are also categorical, we can use statistical tests to determine whether the output variable is dependent or independent of the input variables. If independent, then the input variable is a candidate for a feature that may be irrelevant […]

Read more

Statistical Significance Tests for Comparing Machine Learning Algorithms

Last Updated on August 8, 2019 Comparing machine learning methods and selecting a final model is a common operation in applied machine learning. Models are commonly evaluated using resampling methods like k-fold cross-validation from which mean skill scores are calculated and compared directly. Although simple, this approach can be misleading as it is hard to know whether the difference between mean skill scores is real or the result of a statistical fluke. Statistical significance tests are designed to address this […]

Read more

Controlled Experiments in Machine Learning

Last Updated on August 8, 2019 Systematic experimentation is a key part of applied machine learning. Given the complexity of machine learning methods, they resist formal analysis methods. Therefore, we must learn about the behavior of algorithms on our specific problems empirically. We do this using controlled experiments. In this tutorial, you will discover the important role that controlled experiments play in applied machine learning. After completing this tutorial, you will know: The need for systematic discovery via controlled experiments. […]

Read more

10 Examples of How to Use Statistical Methods in a Machine Learning Project

Last Updated on August 8, 2019 Statistics and machine learning are two very closely related fields. In fact, the line between the two can be very fuzzy at times. Nevertheless, there are methods that clearly belong to the field of statistics that are not only useful, but invaluable when working on a machine learning project. It would be fair to say that statistical methods are required to effectively work through a machine learning predictive modeling project. In this post, you […]

Read more

What is Statistics (and why is it important in machine learning)?

Last Updated on August 8, 2019 Statistics is a collection of tools that you can use to get answers to important questions about data. You can use descriptive statistical methods to transform raw observations into information that you can understand and share. You can use inferential statistical methods to reason from small samples of data to whole domains. In this post, you will discover clearly why statistics is important in general and for machine learning and generally the types of […]

Read more

The Close Relationship Between Applied Statistics and Machine Learning

Last Updated on August 8, 2019 The machine learning practitioner has a tradition of algorithms and a pragmatic focus on results and model skill above other concerns such as model interpretability. Statisticians work on much the same type of modeling problems under the names of applied statistics and statistical learning. Coming from a mathematical background, they have more of a focus on the behavior of models and explainability of predictions. The very close relationship between the two approaches to the […]

Read more

Statistics for Evaluating Machine Learning Models

Last Updated on August 14, 2020 Tom Mitchell’s classic 1997 book “Machine Learning” provides a chapter dedicated to statistical methods for evaluating machine learning models. Statistics provides an important set of tools used at each step of a machine learning project. A practitioner cannot effectively evaluate the skill of a machine learning model without using statistical methods. Unfortunately, statistics is an area that is foreign to most developers and computer science graduates. This makes the chapter in Mitchell’s seminal machine […]

Read more

How to Generate Random Numbers in Python

Last Updated on September 4, 2020 The use of randomness is an important part of the configuration and evaluation of machine learning algorithms. From the random initialization of weights in an artificial neural network, to the splitting of data into random train and test sets, to the random shuffling of a training dataset in stochastic gradient descent, generating random numbers and harnessing randomness is a required skill. In this tutorial, you will discover how to generate and work with random […]

Read more
1 2 3 4 5 6