Case Study: Predicting the Onset of Diabetes Within Five Years (part 1 of 3)

Last Updated on August 22, 2019

This is a guest post by Igor Shvartser, a clever young student I have been coaching.

This post is part 1 in a 3 part series on modeling the famous Pima Indians Diabetes dataset that will introduce the problem and the data. Part 2 will investigate feature selection and spot checking algorithms and Part 3 in the series will investigate improvements to the classification accuracy and final presentation of results.

Kick-start your project with my new book Machine Learning Mastery With Weka, including step-by-step tutorials and clear screenshots for all examples.

Predict the Onset of Diabetes

Data mining and machine learning is helping medical professionals make diagnosis easier by bridging the gap between huge data sets and human knowledge. We can begin to apply machine learning techniques for classification in a dataset that describes a population that is under a high risk of the onset of diabetes.

Diabetes Mellitus affects 382 million people in the world, and the number of people with type-2 diabetes is increasing in every country. Untreated, diabetes can cause many complications.