The Seductive Trap of Black-Box Machine Learning

Last Updated on April 4, 2018

For as long as I have been participating in data mining and machine learning competitions, I have thought about automating my participation. Maybe it shows that I want to solve the problem of building the tool more than I want to solve the problem at hand.

When working on a dataset, I typically spend a disproportionate amount of time thinking about algorithm tuning and running tuning experiments. I am prone to performing post-competition analysis and regret my allocation of time, almost always thinking that more time could be put into feature engineering, exploring different models and hypotheses, and blending.

Automatic Machine Learner from Hell

More than a few times I have sketched the “perfect” system where model running, tuning and result blending is automated and human effort is focused on defining different perspectives on the problem.

I have even built imperfect variations of this system a few times in a few different languages for competitions. The first version was in Java more than 10 years ago. The most recent version was in R with make files and bash scripts about 6 months ago.