ACML 2020 Tutorial: Forecasting for Data Scientists

18 November 2020


Though machine learners claim for potentially decades that their methods yield great performance for time series forecasting, until recently machine learning methods were not able to outperform even simple benchmarks in forecasting competitions, and did not play a role in practical applications. This has changed in the last 3-4 years, with methods being able to win several prestigious competitions. The models are now competitive as more series, and longer series due to higher sampling rates, are typically available. In this tutorial, we will briefly recap the history of the field of forecasting and its developments parallel to machine learning, and then discuss recent developments in the field, around learning across series with global models, Machine Learning methods such as recurrent neural networks, CNNs, and other models, and how they are now able to outperform traditional methods. We further will look into the intricacies of forecast evaluation, and into more advanced topics such as hierarchical forecasting and multivariate forecasting.

Intended audience

This tutorial is intended for people with a background in Data Science or Machine Learning, that want to get up to speed with the field of forecasting. It also covers many aspects of how Machine Learning methods can be used for forecasting, so that it will also have sections of value to forecasters that want to understand better some of the novel techniques that have been successful recently in forecasting competitions.


Tutorial recording - Youtube

Tutorial recording - Videolectures


 Download pdf  Download R code

Tutorial Outline

  1. Introduction and Motivation
    • Good and bad forecasting problems
    • What can we forecast?
  2. Traditional (statistical univariate) forecasting techniques
    • Naive and Mean forecast
    • (Simple) Exponential Smoothing
    • ARIMA
  3. A brief history of forecasting competitions
    • The M competitions (M1, M3, M4, M5)
    • Controversy of Machine Learning vs Statistical methods
    • CIF 2016 competition
  4. Global forecasting models
    • Paradigm shift from local (per-time-series) to global (across-time-series) forecasting models
    • Recent theoretical insights
    • History of global models and Kaggle forecasting competitions
  5. Machine Learning methods for forecasting
    • How to address non-stationarity?
    • Differencing
    • Modelling trend: Detrending, Box-Cox transform, window-wise normalisation
    • Modelling seasonality: Seasonal dummies, Fourier terms, seasonal decompositions, deseasonalisation
    • Normalisation
    • Direct vs iterative predictions
    • Feature engineering
  6. Deep learning for forecasting
    • Recurrent neural networks
    • Convolutional Networks: Causal convolutions, dilations
    • Specialised architectures
    • Multivariate methods
  7. Forecast evaluation
    • Errors and error measures: Scale-free errors, scaled errors, percentage errors, relative errors
    • Fixed and rolling origin evaluation, cross-validation, tests for serial correlation in the residuals
  8. Probabilistic forecasting
    • Analytical prediction intervals
    • Bootstrapping, MCMC sampling
    • Forecasting parameters of distributions
    • Quantile regression (pinball loss)
  9. Special forecasting problems
    • Intermittent time series: zero-inflated models, adapted loss functions
    • Hierarchical time series: classic approaches, optimal reconciliation, probabilistic hierarchical forecasting
  10. Conclusions


  • K. Bandara, C. Bergmeir, and S. Smyl. Forecasting across time series databases using long short-term memory networks on groups of similar series. arXiv preprint arXiv:1710.03222, 8:805–815, 2017.
  • K. Bandara, C. Bergmeir, and H. Hewamalage. LSTM-MSNet: Leveraging forecasts on sets of related time series with multiple seasonal patterns. IEEE Transactions on Neural Networks and Learning Systems, (forthcoming), 2020a. URL
  • K. Bandara, C. Bergmeir, and S. Smyl. Forecasting across time series databases using recurrent neural networks on groups of similar series: A clustering approach. Expert Systems with Applications, 140:112896, 2020b.
  • J. M. Bates and C. W. Granger. The combination of forecasts. Journal of the Operational Research Society, 20(4): 451–468, 1969.
  • F. Bell and S. Smyl. Forecasting at Uber: An introduction, 2018. URL Accessed 2 September 2020.
  • S. Ben Taieb, G. Bontempi, A. F. Atiya, and A. Sorjamaa. A review and comparison of strategies for multi-step ahead time series forecasting based on the NN5 forecasting competition. Expert Syst. Appl., 39(8):7067–7083, June 2012.
  • S. Ben Taieb, J. W. Taylor, and R. J. Hyndman. Coherent probabilistic forecasts for hierarchical time series. In International Conference on Machine Learning, pages 3348–3357, 2017.
  • S. Ben Taieb, J. W. Taylor, and R. J. Hyndman. Hierarchical probabilistic forecasting of electricity demand with smart meter data. Journal of the American Statistical Association, pages 1–17, 2020.
  • K. Benidis, S. S. Rangapuram, V. Flunkert, B. Wang, D. Maddix, C. Turkmen, J. Gasthaus, M. Bohlke-Schneider,
  • D. Salinas, L. Stella, et al. Neural forecasting: Introduction and literature overview. arXiv preprint arXiv:2004.10240, 2020.
  • C. Bergmeir and J. M. Benítez. On the use of cross-validation for time series predictor evaluation. Information Sciences, 191:192–213, 2012.
  • C. Bergmeir, R. J. Hyndman, and B. Koo. A note on the validity of cross-validation for evaluating autoregressive time series prediction. Computational Statistics & Data Analysis, 120:70–83, 2018.
  • C. S. Bojer and J. P. Meldgaard. Kaggle forecasting competitions: An overlooked learning opportunity. International Journal of Forecasting, 2020.
  • A. Borovykh, S. Bohte, and C. W. Oosterlee. Dilated convolutional neural networks for time series forecasting. Journal of Computational Finance, 2018. doi: 10.21314/jcf.2019.358. URL
  • G. Box and G. Jenkins. Time Series Analysis: Forecasting and Control. Holden-Day, 1970.
  • P. Burman, E. Chow, and D. Nolan. A cross-validatory method for dependent data. Biometrika, 81(2):351–358, 1994. ISSN 00063444.
  • O. Claveria and S. Torra. Forecasting tourism demand to catalonia: Neural networks vs. time series models. Econ. Model., 36:220–228, Jan. 2014.
  • R. B. Cleveland, W. S. Cleveland, J. McRae, and I. Terpenning. STL: A seasonal-trend decomposition procedure based on loess. Journal of Official Statistics, 6:3–73, 1990.
  • J. D. Croston. Forecasting and stock control for intermittent demands. Journal of the Operational Research Society, 23(3):289–303, 1972.
  • A. Dokumentov and R. J. Hyndman. Str: A seasonal-trend decomposition procedure based on regression. arXiv preprint arXiv:2009.05894, 2020.
  • G. T. Duncan, W. L. Gorr, and J. Szczypula. Forecasting analogous time series. In Principles of forecasting, pages 195–213. Springer, 2001.
  • V. Flunkert, D. Salinas, and J. Gasthaus. Deepar: Probabilistic forecasting with autoregressive recurrent networks. CoRR, abs/1704.04110, 2017. URL
  • J. Gasthaus, K. Benidis, Y. Wang, S. S. Rangapuram, D. Salinas, V. Flunkert, and T. Januschowski. Probabilistic forecasting with spline quantile function rnns. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 1901–1910, 2019.
  • T. Gneiting and A. E. Raftery. Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association, 102(477):359–378, 2007.
  • P. Goodwin. The Holt-Winters approach to exponential smoothing: 50 years old and going strong. Foresight: The International Journal of Applied Forecasting, 19:30–33, 2010.
  • H. Hewamalage, C. Bergmeir, and K. Bandara. Recurrent neural networks for time series forecasting: Current status and future directions. International Journal of Forecasting, (forthcoming), 2020. URL
  • R. Hyndman and Y. Khandakar. Automatic time series forecasting: The forecast package for R. Journal of Statistical Software, 27(3):1–22, 2008.
  • R. Hyndman and A. Koehler. Another look at measures of forecast accuracy. International Journal of Forecasting, 22(4):679–688, 2006.
  • R. Hyndman, A. Koehler, R. Snyder, and S. Grose. A state space framework for automatic forecasting using exponential smoothing methods. International Journal of Forecasting, 18(3):439–454, 2002.
  • R. J. Hyndman. Ten years of forecast reconciliation, 2020a. URL Accessed 10 November 2020.
  • R. J. Hyndman. A brief history of forecasting competitions. International Journal of Forecasting, 36(1):7–14, 2020b.
  • R. J. Hyndman and G. Athanasopoulos. Forecasting: principles and practice. OTexts, 2018.
  • R. J. Hyndman, R. A. Ahmed, G. Athanasopoulos, and H. L. Shang. Optimal combination forecasts for hierarchical time series. Computational statistics & data analysis, 55(9):2579–2589, 2011.
  • T. Januschowski, J. Gasthaus, Y. Wang, D. Salinas, V. Flunkert, M. Bohlke-Schneider, and L. Callot. Criteria for classifying forecasting methods. International Journal of Forecasting, 36(1):167–177, 2020.
  • A. Jordan, F. Krüger, and S. Lerch. Evaluating probabilistic forecasts with scoringrules. arXiv preprint arXiv:1709.04743, 2017.
  • G. Lai, W.-C. Chang, Y. Yang, and H. Liu. Modeling long- and Short-Term temporal patterns with deep neural networks. In The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’18, pages 95–104, New York, NY, USA, 2018. ACM.
  • M. Landry, T. P. Erlinger, D. Patschke, and C. Varrichio. Probabilistic gradient boosting machines for gefcom2014 wind forecasting. International Journal of Forecasting, 32(3):1061–1066, 2016.
  • S. Li, X. Jin, Y. Xuan, X. Zhou, W. Chen, Y.-X. Wang, and X. Yan. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. In Advances in Neural Information Processing Systems, pages 5243–5253, 2019.
  • B. Lim, S. O. Arik, N. Loeff, and T. Pfister. Temporal fusion transformers for interpretable multi-horizon time series forecasting. arXiv preprint arXiv:1912.09363, 2019.
  • G. M. Ljung and G. E. Box. On a measure of lack of fit in time series models. Biometrika, 65(2):297–303, 1978.
  • S. Makridakis and M. Hibon. Accuracy of forecasting: An empirical investigation. Journal of the Royal Statistical Society: Series A (General), 142(2):97–125, 1979.
  • S. Makridakis and M. Hibon. The M3-competition: Results, conclusions and implications. International Journal of Forecasting, 16(4):451–476, 2000.
  • S. Makridakis, A. Andersen, R. Carbone, R. Fildes, M. Hibon, R. Lewandowski, J. Newton, E. Parzen, and
  • R. Winkler. The accuracy of extrapolation (time series) methods: Results of a forecasting competition. Journal of forecasting, 1(2):111–153, 1982.
  • S. Makridakis, E. Spiliotis, and V. Assimakopoulos. The m4 competition: Results, findings, conclusion and way forward. International Journal of Forecasting, 34(4):802–808, 2018a.
  • S. Makridakis, E. Spiliotis, and V. Assimakopoulos. Statistical and machine learning forecasting methods: Concerns and ways forward. PloS one, 13(3):e0194889, 2018b.
  • S. Makridakis, E. Spiliotis, and V. Assimakopoulos. The m5 accuracy competition: Results, findings and conclusions. 10 2020.
  • J. Miller. When recurrent models don’t need to be recurrent, 2018. URL Accessed 10 November 2020.
  • J. Miller and M. Hardt. Stable recurrent models. arXiv preprint arXiv:1805.10369, 2018.
  • P. Montero-Manso and R. J. Hyndman. Principles and algorithms for forecasting groups of time series: Locality and globality. arXiv preprint arXiv:2008.00444, 2020.
  • M. Nelson, T. Hill, W. Remus, and M. O’Connor. Time series forecasting using neural networks: should the data be deseasonalized first? J. Forecast., 18(5):359–367, 1999.
  • A. v. d. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and
  • K. Kavukcuoglu. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499, 2016.
  • B. N. Oreshkin, D. Carpov, N. Chapados, and Y. Bengio. N-beats: Neural basis expansion analysis for interpretable time series forecasting. arXiv preprint arXiv:1905.10437, 2019.
  • A. Panagiotelis, P. Gamakumara, G. Athanasopoulos, R. Hyndman, et al. Probabilistic forecast reconciliation: Properties, evaluation and score optimisation. Technical report, Monash University, Department of Econometrics and Business Statistics, 2020.
  • R. Pascanu, T. Mikolov, and Y. Bengio. On the difficulty of training recurrent neural networks. In International conference on machine learning, pages 1310–1318, 2013.
  • L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin. Catboost: unbiased boosting with categorical features. In Advances in neural information processing systems, pages 6638–6648, 2018.
  • J. Racine. Consistent cross-validatory model-selection for dependent data: hv-block cross-validation. Journal of Econometrics, 99(1):39–61, 2000.
  • S. S. Rangapuram, M. W. Seeger, J. Gasthaus, L. Stella, Y. Wang, and T. Januschowski. Deep state space models for time series forecasting. In Advances in neural information processing systems, pages 7785–7794, 2018.
  • D. Salinas, M. Bohlke-Schneider, L. Callot, R. Medico, and J. Gasthaus. High-dimensional multivariate forecasting with low-rank gaussian copula processes. In Advances in Neural Information Processing Systems, pages 6827–6837, 2019a.
  • D. Salinas, V. Flunkert, J. Gasthaus, and T. Januschowski. Deepar: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting, 2019b. ISSN 0169-2070.
  • R. Sen, H.-F. Yu, and I. S. Dhillon. Think globally, act locally: A deep neural network approach to high-dimensional time series forecasting. In Advances in Neural Information Processing Systems, pages 4837–4846, 2019.
  • R. Sharda and R. B. Patil. Connectionist approach to time series prediction: an empirical test. J. Intell. Manuf., 3 (5):317–323, Oct. 1992.
  • S. Smyl. A hybrid method of exponential smoothing and recurrent neural networks for time series forecasting. International Journal of Forecasting, 36(1):75–85, 2020.
  • S. Smyl and K. Kuber. Data preprocessing and augmentation for multiple short time series forecasting with recurrent neural networks. In 36th International Symposium on Forecasting, 2016.
  • M. Štěpnička and M. Burda. Computational intelligence in forecasting (CIF) 2016 time series forecasting competition. In IEEE WCCI 2016, JCNN-13 Advances in Computational Intelligence for Applied Time Series Forecasting (ACIATSF), 2016.
  • A. Suilin. Kaggle-web-traffic., 2017. Accessed: 2018-11-19.
  • Z. Tang, C. de Almeida, and P. A. Fishwick. Time series forecasting using neural networks vs. box- jenkins methodology. Simulation, 57(5):303–310, Nov. 1991.
  • J. R. Trapero, N. Kourentzes, and R. Fildes. On the identification of sales forecasting models in the presence of promotions. Journal of the Operational Research Society, 66(2):299–307, Feb 2015. ISSN 1476-9360. doi: 10.1057/jors.2013.174.
  • Y. Wang, A. Smola, D. C. Maddix, J. Gasthaus, D. Foster, and T. Januschowski. Deep factors for forecasting. arXiv preprint arXiv:1905.12417, 2019.
  • Y. Wang, C. Faloutsos, V. Flunkert, J. Gasthaus, and T. Januschowski. Forecasting big time series: theory and practice. In The Web Conference, 2020. URL
  • R. Wen, K. Torkkola, B. Narayanaswamy, and D. Madeka. A Multi-Horizon quantile recurrent forecaster. In Neural Information Processing Systems, Nov. 2017.
  • S. L. Wickramasuriya, G. Athanasopoulos, and R. J. Hyndman. Optimal forecast reconciliation for hierarchical and grouped time series through trace minimization. Journal of the American Statistical Association, 114(526): 804–819, 2019.
  • Z. Wu, S. Pan, G. Long, J. Jiang, X. Chang, and C. Zhang. Connecting the dots: Multivariate time series forecasting with graph neural networks. arXiv preprint arXiv:2005.11650, 2020.
  • H.-F. Yu, N. Rao, and I. S. Dhillon. Temporal regularized matrix factorization for high-dimensional time series prediction. In Advances in neural information processing systems, pages 847–855, 2016.
  • G. P. Zhang and D. M. Kline. Quarterly Time-Series forecasting with neural networks. IEEE Trans. Neural Netw., 18(6):1800–1814, Nov. 2007.
  • G. P. Zhang and M. Qi. Neural network forecasting for seasonal and trend time series. Eur. J. Oper. Res., 160(2): 501–514, 2005.
  • H. Zhou, W. Qian, and Y. Yang. Tweedie gradient boosting for extremely unbalanced zero-inflated data. Communications in Statistics-Simulation and Computation, pages 1–23, 2020.

« Facebook Forecasting Summit: Forecasting for Data Scientists | Interpretability and Causal Inference for Global Time Series Forecasting Methods »