How to Handle Big-p, Little-n (p >> n) in Machine Learning

Last Updated on August 19, 2020

What if I have more Columns than Rows in my dataset?

Machine learning datasets are often structured or tabular data comprised of rows and columns.

The columns that are fed as input to a model are called predictors or “p” and the rows are samples “n“. Most machine learning algorithms assume that there are many more samples than there are predictors, denoted as p << n.

Sometimes, this is not the case, and there are many more predictors than samples in the dataset, referred to as “big-p, little-n” and denoted as p >> n. These problems often require specialized data preparation and modeling algorithms to address them correctly.

In this tutorial, you will discover the challenge of big-p, little n or p >> n machine learning problems.

After completing this tutorial, you will know:

Most machine learning problems have many more samples than predictors and most machine learning algorithms make this assumption during the training process.
Some modeling problems have many more predictors than samples, referred to as p >> n.
Algorithms to explore when modeling machine learning datasets with more predictors than samples.

Kick-start your project with my new book To finish reading, please visit source site

Machine Learning Algorithms