Dialog-based multi-item recommendation using automatic evaluation

Euisok Chung;Hyun Woo Kim;Byunghyun Yoo;Ran Han;Jeongmin Yang;Hwa Jeon Song;

doi:10.4218/etrij.2022-0333

ETRI Journal

Volume 46 Issue 2
/
Pages.277-289
/
2024
/
1225-6463(pISSN)
/
2233-7326(eISSN)

Electronics and Telecommunications Research Institute (한국전자통신연구원)

DOI QR Code

Dialog-based multi-item recommendation using automatic evaluation

Euisok Chung (Integrated Intelligence Research Section, Electronics and Telecommunications Research Institute) ;
Hyun Woo Kim (Integrated Intelligence Research Section, Electronics and Telecommunications Research Institute) ;
Byunghyun Yoo (Integrated Intelligence Research Section, Electronics and Telecommunications Research Institute) ;
Ran Han (Integrated Intelligence Research Section, Electronics and Telecommunications Research Institute) ;
Jeongmin Yang (Integrated Intelligence Research Section, Electronics and Telecommunications Research Institute) ;
Hwa Jeon Song (Integrated Intelligence Research Section, Electronics and Telecommunications Research Institute)

Received : 2022.09.01
Accepted : 2023.10.24
Published : 2024.04.20

https://doi.org/10.4218/etrij.2022-0333 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

In this paper, we describe a neural network-based application that recommends multiple items using dialog context input and simultaneously outputs a response sentence. Further, we describe a multi-item recommendation by specifying it as a set of clothing recommendations. For this, a multimodal fusion approach that can process both cloth-related text and images is required. We also examine achieving the requirements of downstream models using a pretrained language model. Moreover, we propose a gate-based multimodal fusion and multiprompt learning based on a pretrained language model. Specifically, we propose an automatic evaluation technique to solve the one-to-many mapping problem of multi-item recommendations. A fashion-domain multimodal dataset based on Koreans is constructed and tested. Various experimental environment settings are verified using an automatic evaluation method. The results show that our proposed method can be used to obtain confidence scores for multi-item recommendation results, which is different from traditional accuracy evaluation.

Keywords

Acknowledgement

This work was supported by an Electronics and Telecommunications Research Institute (ETRI) grant funded by the Korean Government (23ZS1100, Core Technology Research for Self-Improving Integrated Artificial Intelligence Systems).

References

D. Pramod and P. Bafna, Conversational recommender systems techniques, tools, acceptance, and adoption: a state of the art review, Expert Syst. Appl. 203 (2022), 117539.
J. Konstan and L. Terveen, Human-centered recommender systems: origins, advances, challenges, and opportunities, AI Mag. 42 (2021), no. 3, 31-42.
K. Zielnicki, Simulacra and selection: clothing set recommendation at stitch fix, (Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France), 2019, pp. 1379-1380.
Z. Liu, P. Luo, S. Qiu, X. Wang, and X. Tang, Deep Fashion: Powering robust clothes recognition and retrieval with rich annotations, (IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA), 2016, DOI 10.1109/CVPR.2016.124.
H. Wu, Y. Gao, X. Guo, Z. Al-Halah, S. Rennie, K. Grauman, and R. Feris, Fashion IQ: A new dataset towards retrieving images by natural language feedback, (IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA), 2021, pp. 11307-11317.
P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, and G. Neubig, Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing, arXiv preprint, 2021, DOI 10.48550/arXiv.2107.13586.
K. Gadzicki, R. Khamsehashari, and C. Zetzsche, Early vs late fusion in multimodal convolutional neural networks, (IEEE 23rd International Conference on Information Fusion, Rustenburg, South Africa), 2020, pp. 1-6.
S. Huang, A. Pareek, R. Zamanian, I. Banerjee, and M. P. Lungren, Multimodal fusion with deep neural networks for leveraging ct imaging and electronic health record: a case-study in pulmonary embolism detection, Sci. Reports 10 (2020), 22147.
K. Clark, T. Luong, Q. V. Le, and C. Manning, ELECTRA: Pretraining text encoders as discriminators rather than generators, (8th International Conference on Learning Representations, Virtual Conference), 2020.
S. Bao, H. He, F. Wang, H. Wu, and H. Wang, PLATO: pretrained dialogue generation model with discrete latent variable, (Proc. 58th Annual Meeting of the Association for Computational Linguistics), 2020, pp. 85-96.
A. Celikyilmaz, E. Clark, and J. Gao, Evaluation of text generation: a survey, arXiv preprint, 2020, DOI 10.48550/arXiv.2006.14799.
F. Petroni, T. Rocktaschel, S. Riedel, P. Lewis, A. Bakhtin, Y. Wu, and A. Miller, Language models as knowledge bases? (Proceedings of conference on EMNLP-IJCNLP, Hong Kong, China), 2019, pp. 2463-2473.
L. Cui, Y. Wu, J. Liu, S. Yang, and Y. Zhang, Template-based named entity recognition using BART, arXiv preprint, 2021, DOI 10.48550/arXiv.2106.01760
X. Li, J. Feng, Y. Meng, Q. Han, F. Wu, and J. Li, A unified MRC framework for named entity recognition, (Proc. 58th Annual Meeting of the Association for Computational Linguistics, Online), 2020, pp. 5849-5859.
B. Lester, R. Al-Rfou, and N. Constant, The power of scale for parameter-efficient prompt tuning, arXiv preprint, 2021, DOI 10.48550/arXiv.2104.08691.
X. Han, W. Zhao, N. Ding, Z. Liu, and M. Sun, PTR: prompt tuning with rules for text classification, arXiv preprint, 2021, DOI 10.48550/arXiv.2105.11259.
A. Prakash, K. Chitta, and A. Geiger, Multi-modal fusion transformer for end-to-end autonomous driving, arXiv preprint, 2021, DOI 10.48550/arXiv.2104.09224.
A. Nagrani, S. Yang, A. Arnab, A. Jansen, C. Schmid, and C. Sun, Attention bottlenecks for multimodal fusion, Proc. NIPS,34 (2021), 14200-14213.
J. D. S. Ortega, M. Senoussaoui, E. Granger, M. Pedersoli, P. Cardinal, and A. L. Koerich, Multimodal fusion with deep neural networks for audio-video emotion recognition, arXiv preprint 2019, DOI 10.48550/arXiv.1907.03196.
Y. Lu, J. Zeng, J. Zhang, S. Wu, and M. Li, Attention calibration for transformer in neural machine translation, (Proceedings of ACL-IJCNLP, Online), 2021, pp. 1288-1298.
R. Kiros, Y. Zhu, R. Salakhutdinov, R. S. Zemel, A. Torralba, R. Urtasun, and S. Fidler, Skip-thought vectors, arXiv preprint, 2015, DOI 10.48550/arXiv.1506.06726.
L. Logeswaran and H. Lee, An efficient framework for learning sentence representations, (Proceedings of International Conference on Learning, Representations, Vancouver, Canada), 2018.
T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, and Y. Artzi, BERTScore: evaluating text generation with BERT, (Proceedings of International Conference on Learning Representations, ONline), 2020.
F. Heuer, S. Mantowsky, S. S. Bukhari, and G. Schneider, MultiTask-Centernet (MCN): Efficient and diverse multitask learning using an anchor free approach, (IEEE/CVF International Conference on Computer Vision Workshops, Montreal, Canada), 2021, pp. 997-1005.
R. Hu and A. Singh, UniT: Multimodal multitask learning with a unified transformer, (IEEE/CVF International Conference on Computer Vision, Montreal, Canada), 2021, pp. 1439-1449.
X. Liu, P. He, W. Chen, and J. Gao, Multi-task deep neural networks for natural language understanding, (Proceedings of ACL, Florence, Italy), 2019, pp. 4487-4496.
B. Lin, F. Ye, Y. Zhang, and I. W. Tsang, Reasonable effectiveness of random weighting: A litmus test for multi-task learning, arXive preprint, 2021, DOI 10.48550/arXiv.2111.10603
L. Liu, Y. Li, Z. Kuang, J. Xue, Y. Chen, W. Yang, Q. Liao, and W. Zhang, Towards impartial multi-task learning, (Proceedings of International Conference on Learning Representations), 2021.
R. C. Gunasekara, D. Nahamoo, L. C. Polymenakos, D. E. Ciaurri, J. Ganhotra, and K. P. Fadnis, Quantized dialog-a general approach for conversational systems, Comput Speech Lang. 54 (2019), 17-30. https://doi.org/10.1016/j.csl.2018.06.003
E. Chung, H. W. Kim, and H. J. Song, Sentence model based subword embeddings for a dialog system, ETRI J. 44 (2022), 599-612. https://doi.org/10.4218/etrij.2020-0245
M. Park, H. J. Song, and D. Kang, Imbalanced classification via feature dictionary-based minority oversampling, IEEE Access 10 (2022), 34236-34245. https://doi.org/10.1109/ACCESS.2022.3161510
K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, (IEEE Conference on Computer Vision and Pattern Recognition), 2016, pp. 770-778.
E. Chung, H. W. Kim, M. Park, and H. J. Song, Multi-modal approach for FASCODE-EVAL, (Annual Conference on Human and Language Technology), 2021, pp. 514-517.
E. Chung, H. W. Kim, H. Oh, and H. J. Song, Dataset for interactive recommendation system, (Annual Conference on Human and Language Technology), 2020, pp. 481-485.
E. Agirre, E. Alfonseca, K. Hall, J. Kravalova, M. Pa,sca, and A. Soroa, A study on similarity and relatedness using distributional and WordNet-based approaches, (Proceedings of NAACL, Boulder, CO, USA), 2009, pp. 19-27.
J. Park, Koelectra: Pretrained electra model for Korean, 2020. https://github.com/monologg/KoELECTRA
A. Jain, P. K. Singh, and J. Dhar, Multi-objective item evaluation for diverse as well as novel item recommendations, Expert Syst. Appl. 139 (2020), 112857.

ETRI Journal

Dialog-based multi-item recommendation using automatic evaluation

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)