Articles About Machine Learning

A Gentle Introduction to the Fbeta-Measure for Machine Learning

Fbeta-measure is a configurable single-score metric for evaluating a binary classification model based on the predictions made for the positive class. The Fbeta-measure is calculated using precision and recall. Precision is a metric that calculates the percentage of correct predictions for the positive class. Recall calculates the percentage of correct predictions for the positive class out of all positive predictions that could be made. Maximizing precision will minimize the false-positive errors, whereas maximizing recall will minimize the false-negative errors. The […]

Read more

How to Calibrate Probabilities for Imbalanced Classification

Last Updated on August 21, 2020 Many machine learning models are capable of predicting a probability or probability-like scores for class membership. Probabilities provide a required level of granularity for evaluating and comparing models, especially on imbalanced classification problems where tools like ROC Curves are used to interpret predictions and the ROC AUC metric is used to compare model performance, both of which use probabilities. Unfortunately, the probabilities or probability-like scores predicted by many models are not calibrated. This means […]

Read more

Develop a Model for the Imbalanced Classification of Good and Bad Credit

Last Updated on August 28, 2020 Misclassification errors on the minority class are more important than other types of prediction errors for some imbalanced classification tasks. One example is the problem of classifying bank customers as to whether they should receive a loan or not. Giving a loan to a bad customer marked as a good customer results in a greater cost to the bank than denying a loan to a good customer marked as a bad customer. This requires […]

Read more

Imbalanced Classification Model to Detect Mammography Microcalcifications

Last Updated on August 21, 2020 Cancer detection is a popular example of an imbalanced classification problem because there are often significantly more cases of non-cancer than actual cancer. A standard imbalanced classification dataset is the mammography dataset that involves detecting breast cancer from radiological scans, specifically the presence of clusters of microcalcifications that appear bright on a mammogram. This dataset was constructed by scanning the images, segmenting them into candidate objects, and using computer vision techniques to describe each […]

Read more

Predictive Model for the Phoneme Imbalanced Classification Dataset

Last Updated on August 21, 2020 Many binary classification tasks do not have an equal number of examples from each class, e.g. the class distribution is skewed or imbalanced. Nevertheless, accuracy is equally important in both classes. An example is the classification of vowel sounds from European languages as either nasal or oral on speech recognition where there are many more examples of nasal than oral vowels. Classification accuracy is important for both classes, although accuracy as a metric cannot […]

Read more

Imbalanced Classification with the Adult Income Dataset

Last Updated on August 21, 2020 Many binary classification tasks do not have an equal number of examples from each class, e.g. the class distribution is skewed or imbalanced. A popular example is the adult income dataset that involves predicting personal income levels as above or below $50,000 per year based on personal details such as relationship and education level. There are many more cases of incomes less than $50K than above $50K, although the skew is not severe. This […]

Read more

Step-By-Step Framework for Imbalanced Classification Projects

Last Updated on March 19, 2020 Classification predictive modeling problems involve predicting a class label for a given set of inputs. It is a challenging problem in general, especially if little is known about the dataset, as there are tens, if not hundreds, of machine learning algorithms to choose from. The problem is made significantly more difficult if the distribution of examples across the classes is imbalanced. This requires the use of specialized methods to either change the dataset or […]

Read more

Imbalanced Classification with the Fraudulent Credit Card Transactions Dataset

Last Updated on August 21, 2020 Fraud is a major problem for credit card companies, both because of the large volume of transactions that are completed each day and because many fraudulent transactions look a lot like normal transactions. Identifying fraudulent credit card transactions is a common type of imbalanced binary classification where the focus is on the positive class (is fraud) class. As such, metrics like precision and recall can be used to summarize model performance in terms of […]

Read more

Imbalanced Multiclass Classification with the Glass Identification Dataset

Last Updated on August 21, 2020 Multiclass classification problems are those where a label must be predicted, but there are more than two labels that may be predicted. These are challenging predictive modeling problems because a sufficiently representative number of examples of each class is required for a model to learn the problem. It is made challenging when the number of examples in each class is imbalanced, or skewed toward one or a few of the classes with very few […]

Read more

Imbalanced Multiclass Classification with the E.coli Dataset

Last Updated on August 21, 2020 Multiclass classification problems are those where a label must be predicted, but there are more than two labels that may be predicted. These are challenging predictive modeling problems because a sufficiently representative number of examples of each class is required for a model to learn the problem. It is made challenging when the number of examples in each class is imbalanced, or skewed toward one or a few of the classes with very few […]

Read more
1 183 184 185 186 187 202