How to Develop Baseline Forecasts for Multi-Site Multivariate Air Pollution Time Series Forecasting

Last Updated on August 28, 2020

Real-world time series forecasting is challenging for a whole host of reasons not limited to problem features such as having multiple input variables, the requirement to predict multiple time steps, and the need to perform the same type of prediction for multiple physical sites.

The EMC Data Science Global Hackathon dataset, or the ‘Air Quality Prediction‘ dataset for short, describes weather conditions at multiple sites and requires a prediction of air quality measurements over the subsequent three days.

An important first step when working with a new time series forecasting dataset is to develop a baseline in model performance by which the skill of all other more sophisticated strategies can be compared. Baseline forecasting strategies are simple and fast. They are referred to as ‘naive’ strategies because they assume very little or nothing about the specific forecasting problem.

In this tutorial, you will discover how to develop naive forecasting methods for the multistep multivariate air pollution time series forecasting problem.

After completing this tutorial, you will know: