How to Generate Test Datasets in Python with scikit-learn

Last Updated on January 10, 2020

Test datasets are small contrived datasets that let you test a machine learning algorithm or test harness.

The data from test datasets have well-defined properties, such as linearly or non-linearity, that allow you to explore specific algorithm behavior. The scikit-learn Python library provides a suite of functions for generating samples from configurable test problems for regression and classification.

In this tutorial, you will discover test problems and how to use them in Python with scikit-learn.

After completing this tutorial, you will know:

  • How to generate multi-class classification prediction test problems.
  • How to generate binary classification prediction test problems.
  • How to generate linear regression prediction test problems.

Kick-start your project with my new book Machine Learning Mastery With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

  • Updated Jan/2020: Updated for changes in scikit-learn v0.22 API.

Tutorial Overview

This tutorial is divided into 3 parts; they are:

  1. Test Datasets
  2. Classification Test Problems
  3. Regression Test Problems

Test Datasets

A problem when developing and implementing machine learning algorithms is how do you know whether you have implemented them correctly. They seem to work even with bugs.

Test
To finish reading, please visit source site