DOI QR코드

DOI QR Code

Dialog-based multi-item recommendation using automatic evaluation

  • Euisok Chung (Integrated Intelligence Research Section, Electronics and Telecommunications Research Institute) ;
  • Hyun Woo Kim (Integrated Intelligence Research Section, Electronics and Telecommunications Research Institute) ;
  • Byunghyun Yoo (Integrated Intelligence Research Section, Electronics and Telecommunications Research Institute) ;
  • Ran Han (Integrated Intelligence Research Section, Electronics and Telecommunications Research Institute) ;
  • Jeongmin Yang (Integrated Intelligence Research Section, Electronics and Telecommunications Research Institute) ;
  • Hwa Jeon Song (Integrated Intelligence Research Section, Electronics and Telecommunications Research Institute)
  • Received : 2022.09.01
  • Accepted : 2023.10.24
  • Published : 2024.04.20

Abstract

In this paper, we describe a neural network-based application that recommends multiple items using dialog context input and simultaneously outputs a response sentence. Further, we describe a multi-item recommendation by specifying it as a set of clothing recommendations. For this, a multimodal fusion approach that can process both cloth-related text and images is required. We also examine achieving the requirements of downstream models using a pretrained language model. Moreover, we propose a gate-based multimodal fusion and multiprompt learning based on a pretrained language model. Specifically, we propose an automatic evaluation technique to solve the one-to-many mapping problem of multi-item recommendations. A fashion-domain multimodal dataset based on Koreans is constructed and tested. Various experimental environment settings are verified using an automatic evaluation method. The results show that our proposed method can be used to obtain confidence scores for multi-item recommendation results, which is different from traditional accuracy evaluation.

Keywords

Acknowledgement

This work was supported by an Electronics and Telecommunications Research Institute (ETRI) grant funded by the Korean Government (23ZS1100, Core Technology Research for Self-Improving Integrated Artificial Intelligence Systems).

References

  1. D. Pramod and P. Bafna, Conversational recommender systems techniques, tools, acceptance, and adoption: a state of the art review, Expert Syst. Appl. 203 (2022), 117539. 
  2. J. Konstan and L. Terveen, Human-centered recommender systems: origins, advances, challenges, and opportunities, AI Mag. 42 (2021), no. 3, 31-42. 
  3. K. Zielnicki, Simulacra and selection: clothing set recommendation at stitch fix, (Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France), 2019, pp. 1379-1380. 
  4. Z. Liu, P. Luo, S. Qiu, X. Wang, and X. Tang, Deep Fashion: Powering robust clothes recognition and retrieval with rich annotations, (IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA), 2016, DOI 10.1109/CVPR.2016.124. 
  5. H. Wu, Y. Gao, X. Guo, Z. Al-Halah, S. Rennie, K. Grauman, and R. Feris, Fashion IQ: A new dataset towards retrieving images by natural language feedback, (IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA), 2021, pp. 11307-11317. 
  6. P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, and G. Neubig, Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing, arXiv preprint, 2021, DOI 10.48550/arXiv.2107.13586. 
  7. K. Gadzicki, R. Khamsehashari, and C. Zetzsche, Early vs late fusion in multimodal convolutional neural networks, (IEEE 23rd International Conference on Information Fusion, Rustenburg, South Africa), 2020, pp. 1-6. 
  8. S. Huang, A. Pareek, R. Zamanian, I. Banerjee, and M. P. Lungren, Multimodal fusion with deep neural networks for leveraging ct imaging and electronic health record: a case-study in pulmonary embolism detection, Sci. Reports 10 (2020), 22147. 
  9. K. Clark, T. Luong, Q. V. Le, and C. Manning, ELECTRA: Pretraining text encoders as discriminators rather than generators, (8th International Conference on Learning Representations, Virtual Conference), 2020.
  10. S. Bao, H. He, F. Wang, H. Wu, and H. Wang, PLATO: pretrained dialogue generation model with discrete latent variable, (Proc. 58th Annual Meeting of the Association for Computational Linguistics), 2020, pp. 85-96. 
  11. A. Celikyilmaz, E. Clark, and J. Gao, Evaluation of text generation: a survey, arXiv preprint, 2020, DOI 10.48550/arXiv.2006.14799. 
  12. F. Petroni, T. Rocktaschel, S. Riedel, P. Lewis, A. Bakhtin, Y. Wu, and A. Miller, Language models as knowledge bases? (Proceedings of conference on EMNLP-IJCNLP, Hong Kong, China), 2019, pp. 2463-2473. 
  13. L. Cui, Y. Wu, J. Liu, S. Yang, and Y. Zhang, Template-based named entity recognition using BART, arXiv preprint, 2021, DOI 10.48550/arXiv.2106.01760 
  14. X. Li, J. Feng, Y. Meng, Q. Han, F. Wu, and J. Li, A unified MRC framework for named entity recognition, (Proc. 58th Annual Meeting of the Association for Computational Linguistics, Online), 2020, pp. 5849-5859. 
  15. B. Lester, R. Al-Rfou, and N. Constant, The power of scale for parameter-efficient prompt tuning, arXiv preprint, 2021, DOI 10.48550/arXiv.2104.08691. 
  16. X. Han, W. Zhao, N. Ding, Z. Liu, and M. Sun, PTR: prompt tuning with rules for text classification, arXiv preprint, 2021, DOI 10.48550/arXiv.2105.11259. 
  17. A. Prakash, K. Chitta, and A. Geiger, Multi-modal fusion transformer for end-to-end autonomous driving, arXiv preprint, 2021, DOI 10.48550/arXiv.2104.09224. 
  18. A. Nagrani, S. Yang, A. Arnab, A. Jansen, C. Schmid, and C. Sun, Attention bottlenecks for multimodal fusion, Proc. NIPS,34 (2021), 14200-14213. 
  19. J. D. S. Ortega, M. Senoussaoui, E. Granger, M. Pedersoli, P. Cardinal, and A. L. Koerich, Multimodal fusion with deep neural networks for audio-video emotion recognition, arXiv preprint 2019, DOI 10.48550/arXiv.1907.03196. 
  20. Y. Lu, J. Zeng, J. Zhang, S. Wu, and M. Li, Attention calibration for transformer in neural machine translation, (Proceedings of ACL-IJCNLP, Online), 2021, pp. 1288-1298. 
  21. R. Kiros, Y. Zhu, R. Salakhutdinov, R. S. Zemel, A. Torralba, R. Urtasun, and S. Fidler, Skip-thought vectors, arXiv preprint, 2015, DOI 10.48550/arXiv.1506.06726. 
  22. L. Logeswaran and H. Lee, An efficient framework for learning sentence representations, (Proceedings of International Conference on Learning, Representations, Vancouver, Canada), 2018. 
  23. T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, and Y. Artzi, BERTScore: evaluating text generation with BERT, (Proceedings of International Conference on Learning Representations, ONline), 2020.
  24. F. Heuer, S. Mantowsky, S. S. Bukhari, and G. Schneider, MultiTask-Centernet (MCN): Efficient and diverse multitask learning using an anchor free approach, (IEEE/CVF International Conference on Computer Vision Workshops, Montreal, Canada), 2021, pp. 997-1005. 
  25. R. Hu and A. Singh, UniT: Multimodal multitask learning with a unified transformer, (IEEE/CVF International Conference on Computer Vision, Montreal, Canada), 2021, pp. 1439-1449. 
  26. X. Liu, P. He, W. Chen, and J. Gao, Multi-task deep neural networks for natural language understanding, (Proceedings of ACL, Florence, Italy), 2019, pp. 4487-4496. 
  27. B. Lin, F. Ye, Y. Zhang, and I. W. Tsang, Reasonable effectiveness of random weighting: A litmus test for multi-task learning, arXive preprint, 2021, DOI 10.48550/arXiv.2111.10603 
  28. L. Liu, Y. Li, Z. Kuang, J. Xue, Y. Chen, W. Yang, Q. Liao, and W. Zhang, Towards impartial multi-task learning, (Proceedings of International Conference on Learning Representations), 2021. 
  29. R. C. Gunasekara, D. Nahamoo, L. C. Polymenakos, D. E. Ciaurri, J. Ganhotra, and K. P. Fadnis, Quantized dialog-a general approach for conversational systems, Comput Speech Lang. 54 (2019), 17-30.  https://doi.org/10.1016/j.csl.2018.06.003
  30. E. Chung, H. W. Kim, and H. J. Song, Sentence model based subword embeddings for a dialog system, ETRI J. 44 (2022), 599-612.  https://doi.org/10.4218/etrij.2020-0245
  31. M. Park, H. J. Song, and D. Kang, Imbalanced classification via feature dictionary-based minority oversampling, IEEE Access 10 (2022), 34236-34245.  https://doi.org/10.1109/ACCESS.2022.3161510
  32. K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, (IEEE Conference on Computer Vision and Pattern Recognition), 2016, pp. 770-778. 
  33. E. Chung, H. W. Kim, M. Park, and H. J. Song, Multi-modal approach for FASCODE-EVAL, (Annual Conference on Human and Language Technology), 2021, pp. 514-517. 
  34. E. Chung, H. W. Kim, H. Oh, and H. J. Song, Dataset for interactive recommendation system, (Annual Conference on Human and Language Technology), 2020, pp. 481-485. 
  35. E. Agirre, E. Alfonseca, K. Hall, J. Kravalova, M. Pa,sca, and A. Soroa, A study on similarity and relatedness using distributional and WordNet-based approaches, (Proceedings of NAACL, Boulder, CO, USA), 2009, pp. 19-27. 
  36. J. Park, Koelectra: Pretrained electra model for Korean, 2020. https://github.com/monologg/KoELECTRA 
  37. A. Jain, P. K. Singh, and J. Dhar, Multi-objective item evaluation for diverse as well as novel item recommendations, Expert Syst. Appl. 139 (2020), 112857.