Data Cleaning: Turn Messy Data into Tidy Data

Last Updated on August 16, 2020

Data preparation is difficult because the process is not objective, or at least it does not feel that way. Questions like “what is the best form of the data to describe the problem?” are not objective. You have to think from the perspective of the problem you want to solve and try a few different representations through your pipeline.

Hadley Wickham is the Adjunct Professor at Rice University and Chief Scientist and RStudio and he’s deeply interested in this problem. He has authored some of the most popular R packages for organizing and presenting your data such as reshape, plyr and ggplot2. In his journal article Tidy Data, Wickham presents his take on data cleaning and defines what he means by tidy data.

Kick-start your project with my new book Data Preparation for Machine Learning, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

tidy data

Tidy Data, photo by Andrew King

Data Cleaning

A lot of data
To finish reading, please visit source site