Quick and Dirty Data Analysis with Pandas

Last Updated on January 28, 2020

Before you can select and prepare your data for modeling, you need to understand what you’ve got to start with.

If you’re a using the Python stack for machine learning, a library that you can use to better understand your data is Pandas.

In this post you will discover some quick and dirty recipes for Pandas to improve the understanding of your data in terms of it’s structure, distribution and relationships.

Kick-start your project with my new book Machine Learning Mastery With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

  • Update March/2018: Added alternate link to download the dataset as the original appears to have been taken down.

Data Analysis

Data analysis is about asking and answering questions about your data.

As a machine learning practitioner, you may not be very familiar with the domain in which you’re working. It’s ideal to have subject matter experts on hand, but this is not always possible.

These problems also apply when you are learning applied machine learning either with standard machine learning data sets,
To finish reading, please visit source site