How to Identify Outliers in your Data

Last Updated on August 16, 2020

Bojan Miletic asked a question about outlier detection in datasets when working with machine learning algorithms.

This post is in answer to his question.

If you have a question about machine learning, sign-up to the newsletter and reply to an email or use the contact form and ask, I will answer your question and may even turn it into a blog post.

Kick-start your project with my new book Data Preparation for Machine Learning, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

Outliers

Many machine learning algorithms are sensitive to the range and distribution of attribute values in the input data.

Outliers in input data can skew and mislead the training process of machine learning algorithms resulting in longer training times, less accurate models and ultimately poorer results.

Outlier

Outlier
Photo by Robert S. Donovan, some rights reserved

Even before predictive models are prepared on training data, outliers can result in misleading representations and in turn misleading interpretations of collected
To finish reading, please visit source site