Multimodal Block Transformer for Multimodal Time Series Forecasting

Sungho Park;

한국정보처리학회:학술대회논문집 (Annual Conference of KIPS)

한국정보처리학회 2024년도 추계학술발표대회
/
Pages.636-639
/
2024
/
2005-0011(pISSN)
/
2671-7298(eISSN)

한국정보처리학회 (Korea Information Processing Society)

Multimodal Block Transformer for Multimodal Time Series Forecasting

Sungho Park (Dept. of Applied Artificial Intelligence, Korea University)

발행 : 2024.10.31

PDF

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

Time series forecasting can be enhanced by integrating various data modalities beyond the past observations of the target time series. This paper introduces the Multimodal Block Transformer, a novel architecture that incorporates multivariate time series data alongside multimodal static information, which remains invariant over time, to improve forecasting accuracy. The core feature of this architecture is the Block Attention mechanism, designed to efficiently capture dependencies within multivariate time series by condensing multiple time series variables into a single unified sequence. This unified temporal representation is then fused with other modality embeddings to generate a non-autoregressive multi-horizon forecast. The model was evaluated on a dataset containing daily movie gross revenues and corresponding multimodal information about movies. Experimental results demonstrate that the Multimodal Block Transformer outperforms state-of-the-art models in both multivariate and multimodal time series forecasting.

키워드

참고문헌

Vaswani, A. et al. (2017) 'Attention is all you need', Advances in Neural Information Processing Systems, 2017-December, pp. 5999-6009.
Lim, B. et al. (2021) 'Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting', International Journal of Forecasting, 37(4), pp. 1748-1764.
Skenderi, G. et al. (2024) 'Well googled is half done: Multimodal forecasting of new fashion product sales with image-based google trends', Journal of Forecasting, 43(6), pp. 1982-1997.
Dosovitskiy, A. et al. (2021) 'An Image Is Worth 16X16 Words: Transformers for Image Recognition at Scale', ICLR 2021 - 9th International Conference on Learning Representations [Preprint].
Devlin, J. et al. (2019) 'BERT: Pre-training of deep bidirectional transformers for language understanding', NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, 1, pp. 4171-4186.

한국정보처리학회:학술대회논문집 (Annual Conference of KIPS)

Multimodal Block Transformer for Multimodal Time Series Forecasting

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)