How to Kick Ass in Competitive Machine Learning

Last Updated on June 7, 2016 David Kofoed Wind posted an article to the Kaggle blog No Free Hunch titled “Learning from the best“. In the post, David summarized 6 key areas related to participating and doing well in competitive machine learning with quotes from top performing kagglers. In this post you will discover the key heuristics for doing well in competitive machine learning distilled from that post. Learning from the bestPhoto by Lida, some rights reserved Learning from Kaggle […]

Read more

Going Beyond Predictions

Last Updated on June 7, 2016 The predictions you make with a predictive model do not matter, it is the use of those predictions that matters. Jeremy Howard was the President and Chief Scientist of Kaggle, the competitive machine learning platform. In 2012 he presented at the O’reilly Strata conference on what he called the Drivetrain Approach for building “data products” that go beyond just predictions. In this post you will discover Howard’s Drivetrain Approach and how you can use […]

Read more

5 Benefits of Competitive Machine Learning

Last Updated on June 7, 2016 Jeremy Howard, formally of Kaggle gave a presentation at the University of San Francisco in mid 2013. In that presentation he touched on some of the broader benefits of machine learning competitions like those held on Kaggle. In this post you will discover 5 points I extracted from this talk that will motivate you to want to start participating in machine learning competitions Competitive Machine Learning is a MeritocracyPhoto by PaulBarber, some rights reserved […]

Read more

Building a Production Machine Learning Infrastructure

Last Updated on June 7, 2016 Midwest.io is was a conference in Kansas City on July 14-15 2014. At the conference, Josh Wills gave a talk on what it takes to build production machine learning infrastructure in a talk titled “From the lab to the factory: Building a Production Machine Learning Infrastructure“. Josh Wills is a the Senior Director of Data Science at Cloudera and formally worked on Google’s ad auction system. In this post you will discover insight into […]

Read more

Model Selection Tips From Competitive Machine Learning

Last Updated on June 7, 2016 After spot checking algorithms on your problem and tuning the better few, you ultimately need to select one or two best models with which to proceed. This problem is called model selection and can be vexing because you need to make a choice given incomplete information. This is where the test harness you create and test options you choose are critical. In this post you will discover the tips for model selection inspired from […]

Read more

How To Get Baseline Results And Why They Matter

Last Updated on June 27, 2017 In my courses and guides, I teach the preparation of a baseline result before diving into spot checking algorithms. A student of mine recently asked: If a baseline is not calculated for a problem, will it make the results of other algorithms questionable? He went on to ask: If other algorithms do not give better accuracy than the baseline, what lesson should we take from it? Does it indicate that the data set does not […]

Read more

Why Aren’t My Results As Good As I Thought? You’re Probably Overfitting

Last Updated on August 15, 2020 We all know the satisfaction of running an analysis and seeing the results come back the way we want them to: 80% accuracy; 85%; 90%? The temptation is strong just to turn to the Results section of the report we’re writing, and put the numbers in. But wait: as always, it’s not that straightforward. Succumbing to this particular temptation could undermine the impact of otherwise completely valid analysis. With most machine learning algorithms it’s […]

Read more

Data Management Matters And Why You Need To Take It Seriously

Last Updated on March 5, 2020 We live in a world drowning in data. Internet tracking, stock market movement, genome sequencing technologies and their ilk all produce enormous amounts of data. Most of this data is someone else’s responsibility, generated by someone else, stored in someone else’s database, which is maintained and made available by… you guessed it… someone else. But. Whenever we carry out a machine learning project we are working with a small subset of the all the […]

Read more

Understand Your Problem and Get Better Results Using Exploratory Data Analysis

Last Updated on August 15, 2020 You often jump from problem-to-problem in applied machine learning and you need to get up to speed on a new dataset, fast. A classical and under-utilised approach that you can use to quickly build a relationship with a new data problem is Exploratory Data Analysis. In this post you will discover Exploratory Data Analysis (EDA), the techniques and tactics that you can use and why you should be performing EDA on your next problem. […]

Read more

Assessing and Comparing Classifier Performance with ROC Curves

Last Updated on March 5, 2020 The most commonly reported measure of classifier performance is accuracy: the percent of correct classifications obtained. This metric has the advantage of being easy to understand and makes comparison of the performance of different classifiers trivial, but it ignores many of the factors which should be taken into account when honestly assessing the performance of a classifier. What Is Meant By Classifier Performance? Classifier performance is more than just a count of correct classifications. […]

Read more
1 2 3 4 5 6