How to Handle Big-p, Little-n (p >> n) in Machine Learning

Last Updated on August 19, 2020

What if I have more Columns than Rows in my dataset?

Machine learning datasets are often structured or tabular data comprised of rows and columns.

The columns that are fed as input to a model are called predictors or “p” and the rows are samples “n“. Most machine learning algorithms assume that there are many more samples than there are predictors, denoted as p << n.

Sometimes, this is not the case, and there are many more predictors than samples in the dataset, referred to as “big-p, little-n” and denoted as p >> n. These problems often require specialized data preparation and modeling algorithms to address them correctly.

In this tutorial, you will discover the challenge of big-p, little n or p >> n machine learning problems.

After completing this tutorial, you will know:

  • Most machine learning problems have many more samples than predictors and most machine learning algorithms make this assumption during the training process.
  • Some modeling problems have many more predictors than samples, referred to as p >> n.
  • Algorithms to explore when modeling machine learning datasets with more predictors than samples.

Kick-start your project with my new book To finish reading, please visit source site