DOI QR코드

DOI QR Code

Sentence model based subword embeddings for a dialog system

  • Chung, Euisok (Integrated Intelligence Research Section, Electronics and Telecommunications Research Institute) ;
  • Kim, Hyun Woo (Integrated Intelligence Research Section, Electronics and Telecommunications Research Institute) ;
  • Song, Hwa Jeon (Integrated Intelligence Research Section, Electronics and Telecommunications Research Institute)
  • Received : 2020.06.15
  • Accepted : 2022.03.15
  • Published : 2022.08.10

Abstract

This study focuses on improving a word embedding model to enhance the performance of downstream tasks, such as those of dialog systems. To improve traditional word embedding models, such as skip-gram, it is critical to refine the word features and expand the context model. In this paper, we approach the word model from the perspective of subword embedding and attempt to extend the context model by integrating various sentence models. Our proposed sentence model is a subword-based skip-thought model that integrates self-attention and relative position encoding techniques. We also propose a clustering-based dialog model for downstream task verification and evaluate its relationship with the sentence-model-based subword embedding technique. The proposed subword embedding method produces better results than previous methods in evaluating word and sentence similarity. In addition, the downstream task verification, a clustering-based dialog system, demonstrates an improvement of up to 4.86% over the results of FastText in previous research.

Keywords

Acknowledgement

This work was supported by an Electronics and Telecommunications Research Institute (ETRI) grant funded by the Korean government [22ZS1100, Core Technology Research for Self-Improving Integrated Artificial Intelligence System].

References

  1. A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, Language models are unsupervised multitask learners, OpenAI Blog 1 (2019), no. 8, 9. https://www.techbooky.com/wp-content/uploads/2019/02/BetterLanguage-Models-and-Their-Implications.pdf
  2. J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, Bert: Pretraining of deep bidirectional transformers for language understanding, arXiv preprint, 2018. https://doi.org/10.48550/arXiv.1810.0480
  3. Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le, XLNet: Generalized autoregressive pretraining for language understanding, arXiv preprint, 2019. https://doi.org/10.48550/arXiv.1906.08237
  4. N. Pappas and J. Henderson, Deep residual output layers for neural language generation, arXiv preprint, ICML, 2019, pp. 5000-5011. https://doi.org/10.48550/arXiv.1905.05513
  5. S. Kumar and Y. Tsvetkov, Von mises-fisher loss for training sequence to sequence models with continuous outputs (The Seventh International Conference on Learning Representations, New Orleans, USA), May 2019. https://openreview.net/forum?id=rJlDnoA5Y7
  6. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, Distributed representations of words and phrases and their compositionality (NIPS'13: Proceedings of the 26th International Conference on Neural Information Processing Systems, Red Hook, NY, USA), Dec. 2013, pp. 3111-3119.
  7. P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, Enriching word vectors with subword information, TACL 5 (2017), 135-146. https://www.aclweb.org/anthology/Q17-1010 https://doi.org/10.1162/tacl_a_00051
  8. R. Kiros, Y. Zhu, R. R. Salakhutdinov, R. Zemel, R. Urtasun, A. Torralba, and S. Fidler, Skip-thought vectors (NIPS'15: Proceedings of the 28th International Conference on Neural Information Processing Systems, Cambridge, MA, USA), Dec. 2015, pp. 3294-3302.
  9. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, Attention is all you need (NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA), Dec. 2017, pp. 5998-6008.
  10. P. Shaw, J. Uszkoreit, and A. Vaswani, Self-attention with relative position representations (Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, LA, USA), 2018, pp. 464-468. https://doi.org/10.18653/v1/N18-2074
  11. Y. Matsui, K. Ogaki, T. Yamasaki, and K. Aizawa, PQk-means: Billion-scale clustering for product-quantized codes (MM '17: Proceedings of the 25th ACM international conference on Multimedia, Mountain View, CA, USA), 2017, pp. 1725-1733. https://doi.org/10.1145/3123266.3123430
  12. V. Tshitoyan, J. Dagdelen, L. Weston, A. Dunn, Z. Rong, O. Kononova, K. A. Persson, G. Ceder, and A. Jain, Unsupervised word embeddings capture latent knowledge from materials science literature, Nature 571 (2019), no. 7763, 95-98. https://doi.org/10.1038/s41586-019-1335-8
  13. O. Levy and Y. Goldberg, Dependency-based word embeddings (Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Maltimore, MD, USA), 2014, pp. 302-308. https://doi.org/10.3115/v1/P14-2050
  14. B. Athiwaratkun, A. Wilson, and A. Anandkumar, Probabilistic fasttext for multi-sense word embeddings (Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia), 2018, pp. 1-11. https://doi.org/10.18653/v1/P18-1001
  15. F. Tian, H. Dai, J. Bian, B. Gao, R. Zhang, E. Chen, and T. Y. Liu, A probabilistic model for learning multi-prototype word embeddings (Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, Dublin, Ireland), 2014, pp. 151-160.
  16. D. Kiela, C. Wang, and K. Cho, Dynamic meta-embeddings for improved sentence representations, (Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium), 2018, pp. 1466-1477. https://doi.org/10.18653/v1/d18-1176
  17. S. Park, J. Byun, S. Baek, Y. Cho, and A. Oh, Subword-level word vector representations for Korean (Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia), 2018, pp. 2429-2438. https://doi.org/10.18653/v1/P18-1226
  18. S. Sasaki, J. Suzuki, and K. Inui, Subword-based compact reconstruction of word embeddings (Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA), 2019, pp. 3498-3508. https://doi.org/10.18653/v1/N19-1353
  19. B. Heinzerling and M. Strube, BPEmb: Tokenization-free pretrained subword embeddings in 275 languages (Proceedings of the Eleventh International Conference on Language Resources and Evaluation, Miyazaki, Japan), 2018.
  20. R. Sennrich, B. Haddow, and A. Birch, Neural machine translation of rare words with subword units (Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany), 2016, pp. 1715-1725. https://doi.org/10.18653/v1/P16-1162
  21. S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Comput. 9 (1997), no. 8, 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735
  22. T. Kenter, A. Borisov, and M. de Rijke, Siamese CBOW: Optimizing word embeddings for sentence representations (Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany), 2016, pp. 941-951. https://doi.org/10.18653/v1/P16-1089
  23. E. Chung, H. J. Jeon, S. J. Lee, and J. G. Park, Korean phoneme sequence based word embedding (Proceedings of HCLT), 2017, pp. 225-227. http://www.koreascience.or.kr/article/CFKO201731951960129.page
  24. H. Chen, X. Liu, D. Yin, and J. Tang, A survey on dialogue systems: Recent advances and new frontiers, SIGKDD Explor. Newsl. 19 (2017), no. 2, 25-35. https://doi.org/10.1145/3166054.3166058
  25. A. Bordes, Y. Boureau, and J. Weston, Learning end-to-end goal-oriented dialog (Proceedings of ICLR), 2017. https://openreview.net/forum?id=S1Bb3D5gg
  26. T. H. Wen, D. Vandyke, N. Mrksic, M. Gasic, L. M. Rojas-Barahona, P. H. Su, S. Ultes, and S. Young, A network-based endto-end trainable task-oriented dialogue system (Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain), 2017, pp. 438-449.
  27. Y. Wu, W. Wu, C. Xing, M. Zhou, and Z. Li, Sequential matching network: A new architecture for multi-turn response selection in retrieval-based chatbots (Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada), 2017, pp. 496-505. https://doi.org/10.18653/v1/P17-1046
  28. Z. Ji, Z. Lu, and H. Li, An information retrieval approach to short text conversation, arXiv preprint, 2014. https://doi.org/10.48550/arXiv.1408.6988
  29. R. Yan, Y. Song, and H. Wu, Learning to respond with deep neural networks for retrieval-based human-computer conversation system (SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, Pisa, Italy), 2016, pp. 55-64. https://doi.org/10.1145/2911451.2911542
  30. C. Xing, Y. Wu, W. Wu, Y. Huang, and M. Zhou, Hierarchical recurrent attention network for response generation (Proceedings of AAAI), 2018, pp. 5610-5617. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/viewPaper/16510
  31. M. Qiu, F. L. Li, S. Wang, X. Gao, Y. Chen, W. Zhao, H. Chen, J. Huang, and W. Chu, AliMe chat: A sequence to sequence and rerank based chatbot engine (Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada), 2017, pp. 498-503. https://doi.org/10.18653/v1/P17-2079
  32. Y. Song, R. Yan, X. Li, D. Zhao, and M. Zhang, Two are better than one: An ensemble of retrieval-and generation-based dialog systems, arXiv preprint, 2016. https://doi.org/10.48550/arXiv.1610.07149
  33. Y. Song, R. Yan, C. T. Li, J. Y. Nie, M. Zhang, and D. Zhao, An ensemble of retrieval-based and generation-based human-computer conversation systems (Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence Main track), 2018, pp. 4382-4388. https://doi.org/10.24963/ijcai.2018/609
  34. H. Cuayahuitl, D. Lee, S. Ryu, S. Choi, I. Hwang, and J. Kim, Deep reinforcement learning for chatbots using clustered actions and human-likeness rewards, arXiv preprint, IJCNN, 2019, pp. 1-8. https://doi.org/10.48550/arXiv.1908.10331
  35. R. C. Gunasekara, D. Nahamoo, L. C. Polymenakos, D. E. Ciaurri, J. Ganhotra, and K. P. Fadnis, Quantized dialog-A general approach for conversational systems, Comput. Speech Lang. 54 (2019), 17-30. https://doi.org/10.1016/j.csl.2018.06.003
  36. H. Perkins and Y. Yang, Dialog intent induction with deep multi-view clustering (Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China), 2019, pp. 4014-4023. https://doi.org/10.18653/v1/D19-1413
  37. T. Mikolov, K. Chen, G. Corrado and J. Dean, Efficient estimation of word representations in vector space, arXive preprint, 2013. https://doi.org/10.48550/arXiv.1301.3781
  38. C. Dyer, Notes on noise contrastive estimation and negative sampling, arXive preprint, 2014. https://doi.org/10.48550/arXiv.1410.8251
  39. K. Cho, B. Van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, Learning phrase representations using RNN encoder-decoder for statistical machine translation (Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar), 2014. https://doi.org/10.3115/v1/D14-1179
  40. L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, and E. Ruppin, Placing search in context: The concept revisited, ACM Trans. Inform. Syst. 20 (2002), no. 1, 116-131. https://doi.org/10.1145/503104.503110
  41. C. Allen and T. Hospedales, Analogies explained: Towards understanding word embeddings, arXive preprint, 2019, pp. 223-231. https://doi.org/10.48550/arXiv.1901.09813
  42. D. Kingma and J. Ba, Adam: A method for stochastic optimization, arXive preprint, ICLR, 2015. https://doi.org/10.48550/arXiv.1412.6980
  43. J. Park, Koelectra: Pretrained electra model for Korean, 2020. https://github.com/monologg/KoELECTRA