The Role of Randomization to Address Confounding Variables in Machine Learning

Last Updated on July 31, 2020

A large part of applied machine learning is about running controlled experiments to discover what algorithm or algorithm configuration to use on a predictive modeling problem.

A challenge is that there are aspects of the problem and the algorithm called confounding variables that cannot be controlled (held constant) and must be controlled-for. An example is the use of randomness in a learning algorithm, such as random initialization or random choices during learning.

The solution is to use randomness in a way that has become a standard in applied machine learning. We can learn more about the rationale for using randomness in controlled experiments by looking briefly at why randomness is used to manage confounding variables in medicine through the use of randomized clinical trials.

In this post, you will discover confounding variables and how we can address them using the tool of randomization.

After reading this post, you will know:

Confounding variables correlated with the independent and dependent variable confuse the effects and impact the results of experiments.
Applied machine learning is concerned with controlled experiments that do suffer known confounding variables.
Randomization of experiments is the key to controlling for confounding
To finish reading, please visit source site

Statistics