DOI QR코드

DOI QR Code

Two-dimensional attention-based multi-input LSTM for time series prediction

  • Kim, Eun Been (Department of Applied Statistics, Chung-Ang University) ;
  • Park, Jung Hoon (Department of Applied Statistics, Chung-Ang University) ;
  • Lee, Yung-Seop (Department of Statistics, Dongguk University) ;
  • Lim, Changwon (Department of Applied Statistics, Chung-Ang University)
  • Received : 2020.08.19
  • Accepted : 2020.12.23
  • Published : 2021.01.31

Abstract

Time series prediction is an area of great interest to many people. Algorithms for time series prediction are widely used in many fields such as stock price, temperature, energy and weather forecast; in addtion, classical models as well as recurrent neural networks (RNNs) have been actively developed. After introducing the attention mechanism to neural network models, many new models with improved performance have been developed; in addition, models using attention twice have also recently been proposed, resulting in further performance improvements. In this paper, we consider time series prediction by introducing attention twice to an RNN model. The proposed model is a method that introduces H-attention and T-attention for output value and time step information to select useful information. We conduct experiments on stock price, temperature and energy data and confirm that the proposed model outperforms existing models.

Keywords

References

  1. Candanedo LM, Feldheim V, and Deramaix D (2017). Data driven prediction models of energy use of appliances in a low-energy house, Energy and Buildings, 140, 81-97. https://doi.org/10.1016/j.enbuild.2017.01.083
  2. Chen S, Wang XX, and Harris CJ (2008). NARX-based nonlinear system identification using orthogonal least squares basis hunting, IEEE Transactions on Control Systems Technology, 16, 78-84. https://doi.org/10.1109/TCST.2007.899728
  3. Cho K, van Merrienboer B, Bahdanau D, and Bengio Y (2014a). On the properties of neural machine translation: Encoder-decoder approaches, arXiv:1409.1259.
  4. Cho K, van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, and Bengio Y (2014b). Learning phrase representations using RNN encoder-decoder for statistical machine translation, arXiv:1406.1078.
  5. Elman JL (1991). Distributed representations, simple recurrent networks, and grammatical structure, Machine Learning, 7, 195-225. https://doi.org/10.1007/BF00114844
  6. Hochreiter S and Schmidhuber J (1997). Long short-term memory, Neural Computation, 9, 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735
  7. Huang Z, Xu W, and Yu K (2015). Bidirectional LSTM-CRF models for sequence tagging. arXiv: 1508.01991.
  8. Hyndman RJ and Benitez JM (2016). Bagging exponential smoothing methods using STL decomposition and Box-Cox transformation, International Journal of Forecasting, 32, 303-312. https://doi.org/10.1016/j.ijforecast.2015.07.002
  9. Jang E, Gu S, and Poole B (2016). Categorical reparameterization with gumbel-softmax, arXiv:1611. 01144.
  10. Li H, Shen Y, and Zhu Y (2018). Stock price prediction using attention-based multi-Input LSTM. In Proceedings of the 10th Asian Conference on Machine Learning, 454-469.
  11. Li G, Wen C, Zheng W, and Chen Y (2011). Identification of a class of nonlinear autoregressive models with exogenous inputs based on kernel machines, IEEE Transactions on Signal Processing, 59, 2146-2159. https://doi.org/10.1109/TSP.2011.2112355
  12. Liu B and Lane I (2016). Attention-based recurrent neural network models for joint intent detection and slot filling, arXiv:1609.01454.
  13. Liu Y, Gong C, Yang L, and Chen Y (2019). DSTP-RNN: a dual-stage two-phase attention-based recurrent neural network for long-term and multivariate time series prediction, arXiv:1904.07464.
  14. McLeod AI and Li WK (1983). Diagnostic checking ARMA time series models using squared-residual autocorrelations, Journal of Time Series Analysis, 4, 269-273. https://doi.org/10.1111/j.1467-9892.1983.tb00373.x
  15. Nair V and Hinton GE (2010). Rectified linear units improve restricted boltzmann machines. In ICML.
  16. Pedregosa F, Varoquaux G, Gramfort A, et al. (2011). Scikit-learn: machine learning in Python, The Journal of Machine Learning Research, 12, 2825-2830.
  17. Pham H, Tran V, and Yang BS (2010). A hybrid of nonlinear autoregressive model with exogenous input and autoregressive moving average model for long-term machine state forecasting, Expert Systems with Applications, 37, 3310-3317. https://doi.org/10.1016/j.eswa.2009.10.020
  18. Qin Y, Song D, Chen H, Cheng W, Jiang G, and Cottrell G (2017). A dual-stage attention-based recurrent neural network for time series prediction, arXiv:1704.02971.
  19. Rumelhart DE, Hinton GE, and Williams RJ (1986). Learning internal representations by backpropagating errors, Nature, 323, 533-536. https://doi.org/10.1038/323533a0
  20. Sutskever I, Vinyals O, and Le QV (2014). Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems, 3104-3112.
  21. Tao Y, Ma L, Zhang W, Liu J, Liu W, and Du Q (2018). Hierarchical attention based recurrent highway networks for time series prediction, arXiv:1806.00685.
  22. Werbos P (1990). Backpropagation through time: What it does and how to do it. In Proceedings of the IEEE, 78, 1550-1560.
  23. Yang Z, Yang D, Dyer C, He X, Smola A, and Hovy E (2016). Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1480-1489.
  24. Yu Y and Kim YJ (2019). Two-dimensional attention-based LSTM model for stock index prediction, Journal of Information Processing Systems, 15, 1231-1242. https://doi.org/10.3745/jips.02.0121
  25. Zamora-Martinez F, Romeu-Guallart P, and Pardo J (2014). UCI Machine Learning Repository: SML2010 Data Set, UCI Machine Learning Repository. Available from: https://archive.ics.uci.edu/ml/datasets/SML2010