Training-validation-test split and cross-validation done right

One crucial step in machine learning is the choice of model. A suitable model with suitable hyperparameter is the key to a good prediction result. When we are faced with a choice between models, how should the decision be made? This is why we have cross validation. In scikit-learn, there is a family of functions that help us do this. But quite often, we see cross validation used improperly, or the result of cross validation not being interpreted correctly. In […]

Read more

Data, Learning and Modeling

Last Updated on January 6, 2017 There are key concepts in machine learning that lay the foundation for understanding the field. In this post, you will learn the nomenclature (standard terms) that is used when describing data and datasets. You will also learn the concepts and terms used to describe learning and modeling from data that will provide a valuable intuition for your journey through the field of machine learning. Data Machine learning methods learn from examples. It is important […]

Read more

How to Define Your Machine Learning Problem

Last Updated on June 7, 2016 The first step in any project is defining your problem. You can use the most powerful and shiniest algorithms available, but the results will be meaningless if you are solving the wrong problem. In this post you will learn the process for thinking deeply about your problem before you get started. This is unarguably the most important aspect of applying machine learning. What is the problem?Photo attributed to Eleaf, some rights reserved Problem Definition […]

Read more

How to Evaluate Machine Learning Algorithms

Last Updated on August 16, 2020 Once you have defined your problem and prepared your data you need to apply machine learning algorithms to the data in order to solve your problem. You can spend a lot of time choosing, running and tuning algorithms. You want to make sure you are using your time effectively to get closer to your goal. In this post you will step through a process to rapidly test algorithms and discover whether or not there […]

Read more

How to Improve Machine Learning Results

Last Updated on August 16, 2020 Having one or two algorithms that perform reasonably well on a problem is a good start, but sometimes you may be incentivised to get the best result you can given the time and resources you have available. In this post, you will review methods you can use to squeeze out extra performance and improve the results you are getting from machine learning algorithms. When tuning algorithms you must have a high confidence in the […]

Read more

How to Use Machine Learning Results

Last Updated on June 7, 2016 Once you have found and tuned a viable model of your problem it is time to make use of that model. You may need to revisit your why and remind yourself what form you need a solution for the problem you are solving. The problem is not addressed until you do something with the results. In this post you will learn tactics for presenting your results in answer to a question and considerations when […]

Read more

What is Data Mining and KDD

Last Updated on August 16, 2020 I am very interested in processes. I want to know good ways to do things, even the best way to do things if possible. Even if you don’t have skill or deep understanding, process can get you a long way. It can lead the way and skill and deep understanding can follow. At least, I have used this to drive much of my work. I think it’s useful to study data mining as it […]

Read more

Reproducible Machine Learning Results By Default

Last Updated on August 16, 2020 It is good practice to have reproducible outcomes in software projects. It might even be standard practice by now, I hope it is. You can take any developer off the street and they should be able to follow your process to check out the code base from revision control and make a build of the software ready to use. Even better if you have a procedure for setting up an environment and for releasing […]

Read more

Why you should be Spot-Checking Algorithms on your Machine Learning Problems

Last Updated on August 16, 2020 Spot-checking algorithms is about getting a quick assessment of a bunch of different algorithms on your machine learning problem so that you know what algorithms to focus on and what to discard. Photo by withassociates, some rights reserved In this post you will discover the 3 benefits of spot-checking algorithms, 5 tips for spot-checking on your next problem and the top 10 most popular data mining algorithms that you could use in your suite […]

Read more

Applied Machine Learning Process

Last Updated on July 5, 2019 The Systematic Process For Working Through Predictive Modeling ProblemsThat Delivers Above Average Results Over time, working on applied machine learning problems you develop a pattern or process for quickly getting to good robust results. Once developed, you can use this process again and again on project after project. The more robust and developed your process, the faster you can get to reliable results. In this post, I want to share with you the skeleton […]

Read more
1 2 3 6