How to Train a Final Machine Learning Model

The machine learning model that we use to make predictions on new data is called the final model. There can be confusion in applied machine learning about how to train a final model. This error is seen with beginners to the field who ask questions such as: How do I predict with cross validation? Which model do I choose from cross-validation? Do I use the model after preparing it on the training dataset? This post will clear up the confusion. […]

Read more

7 Ways to Handle Large Data Files for Machine Learning

Exploring and applying machine learning algorithms to datasets that are too large to fit into memory is pretty common. This leads to questions like: How do I load my multiple gigabyte data file? Algorithms crash when I try to run my dataset; what should I do? Can you help me with out-of-memory errors? In this post, I want to offer some common suggestions you may want to consider. 7 Ways to Handle Large Data Files for Machine LearningPhoto by Gareth […]

Read more

What is the Difference Between Test and Validation Datasets?

Last Updated on August 14, 2020 A validation dataset is a sample of data held back from training your model that is used to give an estimate of model skill while tuning model’s hyperparameters. The validation dataset is different from the test dataset that is also held back from the training of the model, but is instead used to give an unbiased estimate of the skill of the final tuned model when comparing or selecting between final models. There is much […]

Read more

How Much Training Data is Required for Machine Learning?

Last Updated on May 23, 2019 The amount of data you need depends both on the complexity of your problem and on the complexity of your chosen algorithm. This is a fact, but does not help you if you are at the pointy end of a machine learning project. A common question I get asked is: How much data do I need? I cannot answer this question directly for you, or for anyone. But I can give you a handful […]

Read more

What is the Difference Between a Parameter and a Hyperparameter?

Last Updated on June 17, 2019 It can be confusing when you get started in applied machine learning. There are so many terms to use and many of the terms may not be used consistently. This is especially true if you have come from another field of study that may use some of the same terms as machine learning, but they are used differently. For example: the terms “model parameter” and “model hyperparameter.” Not having a clear definition for these […]

Read more

How to Plan and Run Machine Learning Experiments Systematically

Machine learning experiments can take a long time. Hours, days, and even weeks in some cases. This gives you a lot of time to think and plan for additional experiments to perform. In addition, the average applied machine learning project may require tens to hundreds of discrete experiments in order to find a data preparation model and model configuration that gives good or great performance. The drawn-out nature of the experiments means that you need to carefully plan and manage […]

Read more

Why Applied Machine Learning Is Hard

How to Handle the Intractability of Applied Machine Learning. Applied machine learning is challenging. You must make many decisions where there is no known “right answer” for your specific problem, such as: What framing of the problem to use? What input and output data to use? What learning algorithm to use? What algorithm configuration to use? This is challenging for beginners that expect that you can calculate or be told what data to use or how to best configure an […]

Read more

So, You are Working on a Machine Learning Problem…

Last Updated on January 9, 2019 So, you’re working on a machine learning problem. I want to really nail down where you’re at right now. Let me make some guesses… So, You are Working on a Machine Learning Problem…Photo by David Mulder, some rights reserved. 1) You Have a Problem So you have a problem that you need to solve. Maybe it’s your problem, an idea you have, a question, or something you want to address. Or maybe it is […]

Read more

The Model Performance Mismatch Problem (and what to do about it)

What To Do If Model Test Results Are Worse than Training. The procedure when evaluating machine learning models is to fit and evaluate them on training data, then verify that the model has good skill on a held-back test dataset. Often, you will get a very promising performance when evaluating the model on the training dataset and poor performance when evaluating the model on the test set. In this post, you will discover techniques and issues to consider when you […]

Read more

How To Know if Your Machine Learning Model Has Good Performance

After you develop a machine learning model for your predictive modeling problem, how do you know if the performance of the model is any good? This is a common question I am asked by beginners. As a beginner, you often seek an answer to this question, e.g. you want someone to tell you whether an accuracy of x% or an error score of x is good or not. In this post, you will discover how to answer this question for […]

Read more
1 3 4 5 6