How To Work Through A Problem Like A Data Scientist

Last Updated on August 15, 2020 In a 2010 post Hilary Mason and Chris Wiggins described the OSEMN process as a taxonomy of tasks that a data scientist should feel comfortable working on. The title of the post was “A Taxonomy of Data Science” on the now defunct dataists blog. This process has also been used as the structure of a recent book, specifically “Data Science at the Command Line: Facing the Future with Time-Tested Tools” by Jeroen Janssens published […]

Read more

Common Pitfalls In Machine Learning Projects

Last Updated on June 7, 2016 In a recent presentation, Ben Hamner described the common pitfalls in machine learning projects he and his colleagues have observed during competitions on Kaggle. The talk was titled “Machine Learning Gremlins” and was presented in February 2014 at Strata. In this post we take a look at the pitfalls from Ben’s talk, what they look like and how to avoid them. Machine Learning Process Early in the talk, Ben presented a snap-shot of the process for working […]

Read more

What To Do During Machine Learning Model Runs

Last Updated on June 7, 2016 There was a recent question that asked “How to not waste-time/procrastinate while ml scripts are running?“. I think this is an important question. I think answers to this question show a level of organization or maturity in your approach to work. I left a small comment on this question, but in this post I elaborate on my answer and give you a few perspectives on how to consider this question, minimize it and even […]

Read more

Choosing Machine Learning Algorithms: Lessons from Microsoft Azure

Last Updated on August 12, 2019 Microsoft recently launched support for machine learning in their Azure cloud computing platform. Buried in some of their technical documentation for the platform are some resources that you may find useful for thinking about what machine learning algorithm to use in different situations. In this post we take a look at the Microsoft recommendations for machine learning algorithms and the lessons that we can use when working through machine learning problems on any platform. […]

Read more

How to Use a Machine Learning Checklist to Get Accurate Predictions, Reliably

Last Updated on August 15, 2020 How do you get accurate results using machine learning on problem after problem? The difficulty is that each problem is unique, requiring different data sources, features, algorithms, algorithm configurations and on and on. The solution is to use a checklist that guarantees a good result every time. In this post you will discover a checklist that you can use to reliably get good results on your machine learning problems. Machine Learning ChecklistPhoto by Crispy, […]

Read more

Simple 3-Step Methodology To The Best Machine Learning Algorithm

Last Updated on August 15, 2020 How do you choose the best algorithm for your dataset? Machine learning is a problem of induction where general rules are learned from specific observed data from the domain. It infeasible (impossible?) to know what representation or what algorithm to use to best learn from the data on a specific problem before hand, without knowing the problem so well that you probably don’t need machine learning to begin with. So what algorithm should you use […]

Read more

Deploy Your Predictive Model To Production

Last Updated on September 30, 2016 5 Best Practices For Operationalizing Machine Learning. Not all predictive models are at Google-scale. Sometimes you develop a small predictive model that you want to put in your software. I recently received this reader question: Actually, there is a part that is missing in my knowledge about machine learning. All tutorials give you the steps up until you build your machine learning model. How could you use this model? In this post, we look at […]

Read more

Machine Learning Performance Improvement Cheat Sheet

Last Updated on May 22, 2019 32 Tips, Tricks and Hacks That You Can Use To Make Better Predictions. The most valuable part of machine learning is predictive modeling. This is the development of models that are trained on historical data and make predictions on new data. And the number one question when it comes to predictive modeling is: How can I get better results? This cheat sheet contains my best advice distilled from years of my own application and […]

Read more

10 Standard Datasets for Practicing Applied Machine Learning

Last Updated on May 20, 2020 The key to getting good at applied machine learning is practicing on lots of different datasets. This is because each problem is different, requiring subtly different data preparation and modeling methods. In this post, you will discover 10 top standard machine learning datasets that you can use for practice. Let’s dive in. Update Mar/2018: Added alternate link to download the Pima Indians and Boston Housing datasets as the originals appear to have been taken […]

Read more

How to Get Started with Kaggle

Last Updated on March 11, 2017 4-Step Process for Getting Started and Getting Good atCompetitive Machine Learning. Kaggle is a community and site for hosting machine learning competitions. Competitive machine learning can be a great way to develop and practice your skills, as well as demonstrate your capabilities. In this post, you will discover a simple 4-step process to get started and get good at competitive machine learning on Kaggle. Let’s get started. How to Get Started with KagglePhoto by […]

Read more
1 2 3 4 5 6