How to Calculate Principal Component Analysis (PCA) from Scratch in Python

Last Updated on August 9, 2019

An important machine learning method for dimensionality reduction is called Principal Component Analysis.

It is a method that uses simple matrix operations from linear algebra and statistics to calculate a projection of the original data into the same number or fewer dimensions.

In this tutorial, you will discover the Principal Component Analysis machine learning method for dimensionality reduction and how to implement it from scratch in Python.

After completing this tutorial, you will know:

  • The procedure for calculating the Principal Component Analysis and how to choose principal components.
  • How to calculate the Principal Component Analysis from scratch in NumPy.
  • How to calculate the Principal Component Analysis for reuse on more data in scikit-learn.

Kick-start your project with my new book Linear Algebra for Machine Learning, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

  • Update Apr/2018: Fixed typo in the explaination of the sklearn PCA attributes. Thanks kris.