Framework for Data Preparation Techniques in Machine Learning

Last Updated on July 17, 2020 There are a vast number of different types of data preparation techniques that could be used on a predictive modeling project. In some cases, the distribution of the data or the requirements of a machine learning model may suggest the data preparation needed, although this is rarely the case given the complexity and high-dimensionality of the data, the ever-increasing parade of new machine learning algorithms and limited, although human, limitations of the practitioner. Instead, […]

Read more

How to Grid Search Data Preparation Techniques

Last Updated on August 17, 2020 Machine learning predictive modeling performance is only as good as your data, and your data is only as good as the way you prepare it for modeling. The most common approach to data preparation is to study a dataset and review the expectations of a machine learning algorithms, then carefully choose the most appropriate data preparation techniques to transform the raw data to best meet the expectations of the algorithm. This is slow, expensive, […]

Read more

How to Create Custom Data Transforms for Scikit-Learn

Last Updated on July 19, 2020 The scikit-learn Python library for machine learning offers a suite of data transforms for changing the scale and distribution of input data, as well as removing input features (columns). There are many simple data cleaning operations, such as removing outliers and removing columns with few observations, that are often performed manually to the data, requiring custom code. The scikit-learn library provides a way to wrap these custom data transforms in a standard way so […]

Read more

Add Binary Flags for Missing Values for Machine Learning

Last Updated on August 17, 2020 Missing values can cause problems when modeling classification and regression prediction problems with machine learning algorithms. A common approach is to replace missing values with a calculated statistic, such as the mean of the column. This allows the dataset to be modeled as per normal but gives no indication to the model that the row original contained missing values. One approach to address this issue is to include additional binary flag input features that […]

Read more

How to Selectively Scale Numerical Input Variables for Machine Learning

Last Updated on August 17, 2020 Many machine learning models perform better when input variables are carefully transformed or scaled prior to modeling. It is convenient, and therefore common, to apply the same data transforms, such as standardization and normalization, equally to all input variables. This can achieve good results on many problems. Nevertheless, better results may be achieved by carefully selecting which data transform to apply to each input variable prior to modeling. In this tutorial, you will discover […]

Read more

Open Source Deep Learning Frameworks and Visual Analytics

Deep Learning gets more and more traction. It basically focuses on one section of Machine Learning: Artificial Neural Networks. This article explains why Deep Learning is a game changer in analytics, when to use it, and how Visual Analytics allows business analysts to leverage the analytic models built by a (citizen) data scientist. What is Deep Learning and Artificial Neural Networks? Deep Learning is the modern buzzword for artificial neural networks, one of many concepts and algorithms in machine learning […]

Read more
1 4 5 6