Multimodal Block Transformer for Multimodal Time Series Forecasting

  • Sungho Park (Dept. of Applied Artificial Intelligence, Korea University)
  • 발행 : 2024.10.31

초록

Time series forecasting can be enhanced by integrating various data modalities beyond the past observations of the target time series. This paper introduces the Multimodal Block Transformer, a novel architecture that incorporates multivariate time series data alongside multimodal static information, which remains invariant over time, to improve forecasting accuracy. The core feature of this architecture is the Block Attention mechanism, designed to efficiently capture dependencies within multivariate time series by condensing multiple time series variables into a single unified sequence. This unified temporal representation is then fused with other modality embeddings to generate a non-autoregressive multi-horizon forecast. The model was evaluated on a dataset containing daily movie gross revenues and corresponding multimodal information about movies. Experimental results demonstrate that the Multimodal Block Transformer outperforms state-of-the-art models in both multivariate and multimodal time series forecasting.

키워드

참고문헌

  1. Vaswani, A. et al. (2017) 'Attention is all you need', Advances in Neural Information Processing Systems, 2017-December, pp. 5999-6009.
  2. Lim, B. et al. (2021) 'Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting', International Journal of Forecasting, 37(4), pp. 1748-1764.
  3. Skenderi, G. et al. (2024) 'Well googled is half done: Multimodal forecasting of new fashion product sales with image-based google trends', Journal of Forecasting, 43(6), pp. 1982-1997.
  4. Dosovitskiy, A. et al. (2021) 'An Image Is Worth 16X16 Words: Transformers for Image Recognition at Scale', ICLR 2021 - 9th International Conference on Learning Representations [Preprint].
  5. Devlin, J. et al. (2019) 'BERT: Pre-training of deep bidirectional transformers for language understanding', NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, 1, pp. 4171-4186.