How to Develop a Probabilistic Model of Breast Cancer Patient Survival

Last Updated on August 21, 2020 Developing a probabilistic model is challenging in general, although it is made more so when there is skew in the distribution of cases, referred to as an imbalanced dataset. The Haberman Dataset describes the five year or greater survival of breast cancer patient patients in the 1950s and 1960s and mostly contains patients that survive. This standard machine learning dataset can be used as the basis of developing a probabilistic model that predicts the […]

Read more

How to Develop an Imbalanced Classification Model to Detect Oil Spills

Last Updated on August 21, 2020 Many imbalanced classification tasks require a skillful model that predicts a crisp class label, where both classes are equally important. An example of an imbalanced classification problem where a class label is required and both classes are equally important is the detection of oil spills or slicks in satellite images. The detection of a spill requires mobilizing an expensive response, and missing an event is equally expensive, causing damage to the environment. One way […]

Read more

A Gentle Introduction to the Fbeta-Measure for Machine Learning

Fbeta-measure is a configurable single-score metric for evaluating a binary classification model based on the predictions made for the positive class. The Fbeta-measure is calculated using precision and recall. Precision is a metric that calculates the percentage of correct predictions for the positive class. Recall calculates the percentage of correct predictions for the positive class out of all positive predictions that could be made. Maximizing precision will minimize the false-positive errors, whereas maximizing recall will minimize the false-negative errors. The […]

Read more

How to Calibrate Probabilities for Imbalanced Classification

Last Updated on August 21, 2020 Many machine learning models are capable of predicting a probability or probability-like scores for class membership. Probabilities provide a required level of granularity for evaluating and comparing models, especially on imbalanced classification problems where tools like ROC Curves are used to interpret predictions and the ROC AUC metric is used to compare model performance, both of which use probabilities. Unfortunately, the probabilities or probability-like scores predicted by many models are not calibrated. This means […]

Read more

Develop a Model for the Imbalanced Classification of Good and Bad Credit

Last Updated on August 28, 2020 Misclassification errors on the minority class are more important than other types of prediction errors for some imbalanced classification tasks. One example is the problem of classifying bank customers as to whether they should receive a loan or not. Giving a loan to a bad customer marked as a good customer results in a greater cost to the bank than denying a loan to a good customer marked as a bad customer. This requires […]

Read more

Imbalanced Classification Model to Detect Mammography Microcalcifications

Last Updated on August 21, 2020 Cancer detection is a popular example of an imbalanced classification problem because there are often significantly more cases of non-cancer than actual cancer. A standard imbalanced classification dataset is the mammography dataset that involves detecting breast cancer from radiological scans, specifically the presence of clusters of microcalcifications that appear bright on a mammogram. This dataset was constructed by scanning the images, segmenting them into candidate objects, and using computer vision techniques to describe each […]

Read more

Predictive Model for the Phoneme Imbalanced Classification Dataset

Last Updated on August 21, 2020 Many binary classification tasks do not have an equal number of examples from each class, e.g. the class distribution is skewed or imbalanced. Nevertheless, accuracy is equally important in both classes. An example is the classification of vowel sounds from European languages as either nasal or oral on speech recognition where there are many more examples of nasal than oral vowels. Classification accuracy is important for both classes, although accuracy as a metric cannot […]

Read more

Imbalanced Classification with the Adult Income Dataset

Last Updated on August 21, 2020 Many binary classification tasks do not have an equal number of examples from each class, e.g. the class distribution is skewed or imbalanced. A popular example is the adult income dataset that involves predicting personal income levels as above or below $50,000 per year based on personal details such as relationship and education level. There are many more cases of incomes less than $50K than above $50K, although the skew is not severe. This […]

Read more

Step-By-Step Framework for Imbalanced Classification Projects

Last Updated on March 19, 2020 Classification predictive modeling problems involve predicting a class label for a given set of inputs. It is a challenging problem in general, especially if little is known about the dataset, as there are tens, if not hundreds, of machine learning algorithms to choose from. The problem is made significantly more difficult if the distribution of examples across the classes is imbalanced. This requires the use of specialized methods to either change the dataset or […]

Read more

Imbalanced Classification with the Fraudulent Credit Card Transactions Dataset

Last Updated on August 21, 2020 Fraud is a major problem for credit card companies, both because of the large volume of transactions that are completed each day and because many fraudulent transactions look a lot like normal transactions. Identifying fraudulent credit card transactions is a common type of imbalanced binary classification where the focus is on the positive class (is fraud) class. As such, metrics like precision and recall can be used to summarize model performance in terms of […]

Read more
1 800 801 802 803 804 861