A Simple Intuition for Overfitting, or Why Testing on Training Data is a Bad Idea

Last Updated on August 21, 2016 When you first start out with machine learning you load a dataset and try models. You might think to yourself, why can’t I just build a model with all of the data and evaluate it on the same dataset? It seems reasonable. More data to train the model is better, right? Evaluating the model and reporting results on the same dataset will tell you how good the model is, right? Wrong. In this post […]

Read more

Classification Accuracy is Not Enough: More Performance Measures You Can Use

Last Updated on June 20, 2019 When you build a model for a classification problem you almost always want to look at the accuracy of that model as the number of correct predictions from all predictions made. This is the classification accuracy. In a previous post, we have looked at evaluating the robustness of a model for making predictions on unseen data using cross-validation and multiple cross-validation where we used classification accuracy and average classification accuracy. Once you have a […]

Read more

Machine Learning Tips from a World Class Practitioner: Phil Brierley

Last Updated on June 7, 2016 Phil Brierley won the Heritage Health Prize Kaggle machine learning competition. Phil was trained as a mechanical engineer and has a background in data mining with his company Tiberius Data Mining. He is heavily into R these days and keeps a blog at Another Data Mining Blog. In October 2013 he presented to the Melbourne Users of R special interest group. The title of his talk was “Techniques to improve the accuracy of your Predictive Models” and you can […]

Read more

BigML Review: Discover the Clever Features in This Machine Learning as a Service Platform

Last Updated on August 16, 2020 Machine Learning has been commoditized into a service. This is a recent trend that looks like it will develop into the mainstream like commoditized storage and virtualization. It is the natural next step. In this review you will learn about BigML that provides commoditized machine learning as a service for business analysts and application integration. About BigML BigML was co-founded by a group of five guys in 2011. Francisco Martin seems to be active […]

Read more

BigML Tutorial: Develop Your First Decision Tree and Make Predictions

Last Updated on June 7, 2016 BigML is a fresh new and interesting machine learning as a service company based out of Corvallis, Oregon, USA. In a previous post, we reviewed the BigML service, the key features and the ways in which you could use this service in your business, on you side project or to present to clients. In this tutorial we will walk through a step-by-step tutorial on developing a predictive model using the BigML platform and use […]

Read more

The Seductive Trap of Black-Box Machine Learning

Last Updated on April 4, 2018 For as long as I have been participating in data mining and machine learning competitions, I have thought about automating my participation. Maybe it shows that I want to solve the problem of building the tool more than I want to solve the problem at hand. When working on a dataset, I typically spend a disproportionate amount of time thinking about algorithm tuning and running tuning experiments. I am prone to performing post-competition analysis […]

Read more

How to Layout and Manage Your Machine Learning Project

Last Updated on June 7, 2016 Project layout is critical for machine learning projects just as it is for software development projects. I think of it like language. A project layout organizes thoughts and gives you context for ideas just like knowing the names for things gives you the basis for thinking. In this post I want to highlight some considerations in the layout and management of your machine learning project. This is very much related to the goals of […]

Read more

Model Prediction Accuracy Versus Interpretation in Machine Learning

Last Updated on August 15, 2020 In their book Applied Predictive Modeling, Kuhn and Johnson comment early on the trade-off of model prediction accuracy versus model interpretation. For a given problem, it is critical to have a clear idea of the which is a priority, accuracy or explainability so that this trade-off can be made explicitly rather than implicitly. In this post you will discover and consider this important trade-off. Model Accuracy vs ExplainabilityPhoto by Donald Hobern, some rights reserved […]

Read more

Clever Application Of A Predictive Model

Last Updated on August 15, 2020 What if you could use a predictive model to find new combinations of attributes that do not exist in the data but could be valuable. In Chapter 10 of Applied Predictive Modeling, Kuhn and Johnson provide a case study that does just this. It’s a fascinating and creative example of how to use a predictive model. In this post we will discover this less obvious use of a predictive model and the types of […]

Read more

How to Kick Ass in Competitive Machine Learning

Last Updated on June 7, 2016 David Kofoed Wind posted an article to the Kaggle blog No Free Hunch titled “Learning from the best“. In the post, David summarized 6 key areas related to participating and doing well in competitive machine learning with quotes from top performing kagglers. In this post you will discover the key heuristics for doing well in competitive machine learning distilled from that post. Learning from the bestPhoto by Lida, some rights reserved Learning from Kaggle […]

Read more
1 2 3 4 6