How to Use the ColumnTransformer for Data Preparation

Last Updated on August 18, 2020

You must prepare your raw data using data transforms prior to fitting a machine learning model.

This is required to ensure that you best expose the structure of your predictive modeling problem to the learning algorithms.

Applying data transforms like scaling or encoding categorical variables is straightforward when all input variables are the same type. It can be challenging when you have a dataset with mixed types and you want to selectively apply data transforms to some, but not all, input features.

Thankfully, the scikit-learn Python machine learning library provides the ColumnTransformer that allows you to selectively apply data transforms to different columns in your dataset.

In this tutorial, you will discover how to use the ColumnTransformer to selectively apply data transforms to columns in a dataset with mixed data types.

After completing this tutorial, you will know:

  • The challenge of using data transformations with datasets that have mixed data types.
  • How to define, fit, and use the ColumnTransformer to selectively apply data transforms to columns.
  • How to work through a real dataset with mixed data types and use the ColumnTransformer to apply different transforms to categorical and numerical data columns.

Kick-start
To finish reading, please visit source site