How to Load, Visualize, and Explore a Multivariate Multistep Time Series Dataset

Last Updated on August 5, 2019

Real-world time series forecasting is challenging for a whole host of reasons not limited to problem features such as having multiple input variables, the requirement to predict multiple time steps, and the need to perform the same type of prediction for multiple physical sites.

The EMC Data Science Global Hackathon dataset, or the ‘Air Quality Prediction‘ dataset for short, describes weather conditions at multiple sites and requires a prediction of air quality measurements over the subsequent three days.

In this tutorial, you will discover and explore the Air Quality Prediction dataset that represents a challenging multivariate, multi-site, and multi-step time series forecasting problem.

After completing this tutorial, you will know:

  • How to load and explore the chunk-structure of the dataset.
  • How to explore and visualize the input and target variables for the dataset.
  • How to use the new understanding to outline a suite of methods for framing the problem, preparing the data, and modeling the dataset.

Kick-start your project with my new book Deep Learning for Time Series Forecasting, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.