[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.5392/JKCA.2022.22.01.138

Improved Transformer Model for Multimodal Fashion Recommendation Conversation System

Park, Yeong Joon (엔에이치엔 다이퀘스트)
Jo, Byeong Cheol (엔에이치엔 다이퀘스트)
Lee, Kyoung Uk (엔에이치엔 다이퀘스트)
Kim, Kyung Sun (엔에이치엔 다이퀘스트)

Publication Information

The Journal of the Korea Contents Association / v.22, no.1, 2022 , pp. 138-147 More about this Journal

Abstract

Recently, chatbots have been applied in various fields and have shown good results, and many attempts to use chatbots in shopping mall product recommendation services are being conducted on e-commerce platforms. In this paper, for a conversation system that recommends a fashion that a user wants based on conversation between the user and the system and fashion image information, a transformer model that is currently performing well in various AI fields such as natural language processing, voice recognition, and image recognition. We propose a multimodal-based improved transformer model that is improved to increase the accuracy of recommendation by using dialogue (text) and fashion (image) information together for data preprocessing and data representation. We also propose a method to improve accuracy through data improvement by analyzing the data. The proposed system has a recommendation accuracy score of 0.6563 WKT (Weighted Kendall's tau), which significantly improved the existing system's 0.3372 WKT by 0.3191 WKT or more.

Keywords

Dialogue System; Transformer; MultiModal; NLP; AI;

Citations & Related Records

Reference

1	K. Cho, B. van Merrienboer, C. Gulcehre, F. Bougares, H. Schwenk, and Y. Bengio, "Learning Phrase Representations using Encoder-Decoder for Statistical Machine Translation," CoRR, Vol.abs, p.1406, 2014.
2	W. Rahman, M. K. Hasan, S. Lee, A. Zadeh, C. Mao, L. P. Morency, and E. Hoque, "Integrating Multimodal Information in Large Pretrained Transformers," in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp.2359-2369, 2020. doi: 10.18653/v1/2020.acl-main.214. DOI
3	S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick, and D. Parikh, "VQA: Visual Question Answering," In Proceedings of the IEEE international conference on computer vision, 2015.
4	정의석, 김현우, 오효정, 송화전, "인터렉션 기반 추천 시스템 개발을 위한 데이터셋 연구," 한글 및 한국어 정보처리 학술 대회, pp.1-5, 2020.
5	H. Palangi, L. Deng, Y. Shen, J. Gao, J. Chen, and R. Ward, "Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval," IEEE/ACM Trans. Audio, Speech and Lang, Vol.24, No.4, pp.694-707, 2016. DOI
6	A. Vaswani, N. Shazzer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is All you Need," in Advances in Neural Information Processing Systems 30 Curran Associates, Inc, pp.5998-6008. 2017.
7	T. Young, D. Hazarika, S. Poria, and E. Cambria, "Recent Trends in Deep Learning Based Natural Language Processing," IEEE Computational Intelligence Magazine, Vol.13, pp.55-75, 2018. DOI
8	Y. N. Chen, A. Celikyilmaz, and D. H. Tur, "Deep Learning for Dialogue Systems," in Proceedings of the 27th International Conference on Computational Linguistics: Tutorial Abstracts, pp.25-31, 2018.
9	H. Chen, X. Liu, D. Yin, and J. Tang, "A Survey on Dialogue Systems," ACM SIGKDD Explor. Newsl, Vol.19, No.2, pp.25-35, 2017.
10	B. H. Su, T. W. Kuan, S. P. Tseng, J. F. Wang, and P. H. Su, "Improved TF-IDF weight method based on sentence similarity for spoken dialogue system," 2016 International Conference on Orange Technologies, pp.36-39, 2016.
11	T. H. Wen, D. Vandyke, N. Mrksic, M. Casic, L. M. Rojas Barahona, P. H. Su, S. Ultes, and S. Young, "A Network-based End-to-End Trainable Task-oriented Dialogue System," CoRR, Vol.1604, p.1236, 2016.
12	A. Bordes, Y. L. Boureau, and J. Weston, "Learning End-to-End Goal-Oriented Dialog," 2017, [Online]. Available: https://openreview.net/forum?id=S1Bb3D5gg.
13	A. Sherstinsky, "Fundamentals of Recurrent Neural Network and Long Short-Term Memory Network," CoRR, Vol.abs, p.1808, 2018, [Online]. Available: http://arxiv.org/abs/1808.03314.
14	Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Sotyanov, "RoBERTa: Robustly Optimized Pretraining Approach," CoRR, Vol.1907.1, 2019. [Online]. Available: http://arxiv.org/abs/1907.11692.
15	https://wikidocs.net/115055, 2021.10.07.
16	J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, "Pre-training of Deep Bidirectional Transformers for Language Understanding," Human Language Technologies, Vol.1, pp.4171-4186, 2019. doi: 10.18653/v1/N19-1423. DOI
17	X. Zhou, L. Li, D. Dong, Y. Liu, Y. Chen, W. X. Zaho, D. Yu, and H. Wu, "Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network," Association for Computational Linguistics, Vol.1, pp.1118-1127, 2018. doi: 10.18653/v1/P18-1103. DOI
18	G. Laput, M. Dontcheva, G. Wilensky, W. Chang, A. Agarwala, J. Linder, and E. Adar, "PixelTone: a multimodal interface for image editing," in 2013 Conference on Human Factors in Computing Systems, 13, Paris, France, pp.2185-2194, 2013. doi: 10.1145/2470654.2481301. DOI
19	J. Devlin, H. Cheong, H. Fang, S. Gupta, L. Deng, X. He, G. Zweig, and M. Mitchell, "Language Models for Image Captioning: The Quirks and What Works," Association for Computational Linguistics, Vol.2, pp.100-105, 2015. doi: 10.3115/v1/P15-2017. DOI
20	K. Clark, M. T. Luong, Q. V. Le, and C. D. Manning, "{ELECTRA:} Pre-training Text Encoders as Discriminators Rather Than Generators," 2020, [Online] Available: https://openreview.net/forum?id=r1xMH1BtvB.
21	https://github.com/monologg/KoELECTRA, 2020
22	C. Gunasekara, J. K. Kummerfeld, L. Polymenakos, and W. Lasecki, "7 Task 1: Noetic End-to-End Response Selection," in Proceedings of the First Workshop on NLP for Conversational AI, 2019, pp.60-67, doi: 10.18653/v1/W19-4107. DOI
23	S. Koo, H. Yu, and G. G. Lee, "Adversarial approach to domain adaptation for reinforcement learning on dialog systems," Pattern Recognit Lett, Vol.128, pp.67-473, 2019.
24	Y. Kim, "Convolutional Neural Networks for Sentence Classification," in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp.1746-1751, 2014.
25	Z. Yu, Z. Xu, A. W. Black, and A. Rudnicky, "Strategy and Policy Learning for Non-Task-Oriented Conversational Systems," in Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pp.404-412, 2016.

KSCI

Improved Transformer Model for Multimodal Fashion Recommendation Conversation System 멀티모달 패션 추천 대화 시스템을 위한 개선된 트랜스포머 모델

Improved Transformer Model for Multimodal Fashion Recommendation Conversation System