Imbalanced Multiclass Classification with the E.coli Dataset

Last Updated on August 21, 2020 Multiclass classification problems are those where a label must be predicted, but there are more than two labels that may be predicted. These are challenging predictive modeling problems because a sufficiently representative number of examples of each class is required for a model to learn the problem. It is made challenging when the number of examples in each class is imbalanced, or skewed toward one or a few of the classes with very few […]

Read more

Neural Networks are Function Approximation Algorithms

Last Updated on August 27, 2020 Supervised learning in machine learning can be described in terms of function approximation. Given a dataset comprised of inputs and outputs, we assume that there is an unknown underlying function that is consistent in mapping inputs to outputs in the target domain and resulted in the dataset. We then use supervised learning algorithms to approximate this function. Neural networks are an example of a supervised machine learning algorithm that is perhaps best understood in […]

Read more

How to Perform Data Cleaning for Machine Learning with Python

Last Updated on June 30, 2020 Data cleaning is a critically important step in any machine learning project. In tabular data, there are many different statistical analysis and data visualization techniques you can use to explore your data in order to identify data cleaning operations you may want to perform. Before jumping to the sophisticated methods, there are some very basic data cleaning operations that you probably should perform on every single machine learning project. These are so basic that […]

Read more

PyTorch Tutorial: How to Develop Deep Learning Models with Python

Last Updated on August 27, 2020 Predictive modeling with deep learning is a skill that modern developers need to know. PyTorch is the premier open-source deep learning framework developed and maintained by Facebook. At its core, PyTorch is a mathematical library that allows you to perform efficient computation and automatic differentiation on graph-based models. Achieving this directly is challenging, although thankfully, the modern PyTorch API provides classes and idioms that allow you to easily develop a suite of deep learning […]

Read more

4 Distance Measures for Machine Learning

Last Updated on August 19, 2020 Distance measures play an important role in machine learning. They provide the foundation for many popular and effective machine learning algorithms like k-nearest neighbors for supervised learning and k-means clustering for unsupervised learning. Different distance measures must be chosen and used depending on the types of the data. As such, it is important to know how to implement and calculate a range of different popular distance measures and the intuitions for the resulting scores. […]

Read more

How to Develop Multi-Output Regression Models with Python

Last Updated on September 15, 2020 Multioutput regression are regression problems that involve predicting two or more numerical values given an input example. An example might be to predict a coordinate given an input, e.g. predicting x and y values. Another example would be multi-step time series forecasting that involves predicting multiple future time series of a given variable. Many machine learning algorithms are designed for predicting a single numeric value, referred to simply as regression. Some algorithms do support […]

Read more

How to Calculate Feature Importance With Python

Last Updated on August 20, 2020 Feature importance refers to techniques that assign a score to input features based on how useful they are at predicting a target variable. There are many types and sources of feature importance scores, although popular examples include statistical correlation scores, coefficients calculated as part of linear models, decision trees, and permutation importance scores. Feature importance scores play an important role in a predictive modeling project, including providing insight into the data, insight into the […]

Read more

Gradient Boosting with Scikit-Learn, XGBoost, LightGBM, and CatBoost

Last Updated on August 28, 2020 Gradient boosting is a powerful ensemble machine learning algorithm. It’s popular for structured predictive modeling problems, such as classification and regression on tabular data, and is often the main algorithm or one of the main algorithms used in winning solutions to machine learning competitions, like those on Kaggle. There are many implementations of gradient boosting available, including standard implementations in SciPy and efficient third-party libraries. Each uses a different interface and even different names […]

Read more

What Is Argmax in Machine Learning?

Last Updated on August 19, 2020 Argmax is a mathematical function that you may encounter in applied machine learning. For example, you may see “argmax” or “arg max” used in a research paper used to describe an algorithm. You may also be instructed to use the argmax function in your algorithm implementation. This may be the first time that you encounter the argmax function and you may wonder what it is and how it works. In this tutorial, you will […]

Read more

10 Clustering Algorithms With Python

Last Updated on August 20, 2020 Clustering or cluster analysis is an unsupervised learning problem. It is often used as a data analysis technique for discovering interesting patterns in data, such as groups of customers based on their behavior. There are many clustering algorithms to choose from and no single best clustering algorithm for all cases. Instead, it is a good idea to explore a range of clustering algorithms and different configurations for each algorithm. In this tutorial, you will […]

Read more
1 799 800 801 802 803 859