[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.4218/etrij.2020-0245

Sentence model based subword embeddings for a dialog system

Chung, Euisok (Integrated Intelligence Research Section, Electronics and Telecommunications Research Institute)
Kim, Hyun Woo (Integrated Intelligence Research Section, Electronics and Telecommunications Research Institute)
Song, Hwa Jeon (Integrated Intelligence Research Section, Electronics and Telecommunications Research Institute)

Publication Information

ETRI Journal / v.44, no.4, 2022 , pp. 599-612 More about this Journal

Abstract

This study focuses on improving a word embedding model to enhance the performance of downstream tasks, such as those of dialog systems. To improve traditional word embedding models, such as skip-gram, it is critical to refine the word features and expand the context model. In this paper, we approach the word model from the perspective of subword embedding and attempt to extend the context model by integrating various sentence models. Our proposed sentence model is a subword-based skip-thought model that integrates self-attention and relative position encoding techniques. We also propose a clustering-based dialog model for downstream task verification and evaluate its relationship with the sentence-model-based subword embedding technique. The proposed subword embedding method produces better results than previous methods in evaluating word and sentence similarity. In addition, the downstream task verification, a clustering-based dialog system, demonstrates an improvement of up to 4.86% over the results of FastText in previous research.

Keywords

dialog; embedding; sentence model; subword;

Citations & Related Records

Times Cited By KSCI : 1 (Citation Analysis)

Reference
Cited By KSCI

1	A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, Language models are unsupervised multitask learners, OpenAI Blog 1 (2019), no. 8, 9. https://www.techbooky.com/wp-content/uploads/2019/02/BetterLanguage-Models-and-Their-Implications.pdf
2	Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le, XLNet: Generalized autoregressive pretraining for language understanding, arXiv preprint, 2019. https://doi.org/10.48550/arXiv.1906.08237
3	N. Pappas and J. Henderson, Deep residual output layers for neural language generation, arXiv preprint, ICML, 2019, pp. 5000-5011. https://doi.org/10.48550/arXiv.1905.05513
4	T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, Distributed representations of words and phrases and their compositionality (NIPS'13: Proceedings of the 26th International Conference on Neural Information Processing Systems, Red Hook, NY, USA), Dec. 2013, pp. 3111-3119.
5	P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, Enriching word vectors with subword information, TACL 5 (2017), 135-146. https://www.aclweb.org/anthology/Q17-1010 DOI
6	A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, Attention is all you need (NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA), Dec. 2017, pp. 5998-6008.
7	H. Cuayahuitl, D. Lee, S. Ryu, S. Choi, I. Hwang, and J. Kim, Deep reinforcement learning for chatbots using clustered actions and human-likeness rewards, arXiv preprint, IJCNN, 2019, pp. 1-8. https://doi.org/10.48550/arXiv.1908.10331 DOI
8	V. Tshitoyan, J. Dagdelen, L. Weston, A. Dunn, Z. Rong, O. Kononova, K. A. Persson, G. Ceder, and A. Jain, Unsupervised word embeddings capture latent knowledge from materials science literature, Nature 571 (2019), no. 7763, 95-98. https://doi.org/10.1038/s41586-019-1335-8 DOI
9	O. Levy and Y. Goldberg, Dependency-based word embeddings (Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Maltimore, MD, USA), 2014, pp. 302-308. https://doi.org/10.3115/v1/P14-2050 DOI
10	F. Tian, H. Dai, J. Bian, B. Gao, R. Zhang, E. Chen, and T. Y. Liu, A probabilistic model for learning multi-prototype word embeddings (Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, Dublin, Ireland), 2014, pp. 151-160.
11	D. Kiela, C. Wang, and K. Cho, Dynamic meta-embeddings for improved sentence representations, (Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium), 2018, pp. 1466-1477. https://doi.org/10.18653/v1/d18-1176 DOI
12	B. Heinzerling and M. Strube, BPEmb: Tokenization-free pretrained subword embeddings in 275 languages (Proceedings of the Eleventh International Conference on Language Resources and Evaluation, Miyazaki, Japan), 2018.
13	S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Comput. 9 (1997), no. 8, 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735 DOI
14	P. Shaw, J. Uszkoreit, and A. Vaswani, Self-attention with relative position representations (Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, LA, USA), 2018, pp. 464-468. https://doi.org/10.18653/v1/N18-2074 DOI
15	L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, and E. Ruppin, Placing search in context: The concept revisited, ACM Trans. Inform. Syst. 20 (2002), no. 1, 116-131. https://doi.org/10.1145/503104.503110 DOI
16	M. Qiu, F. L. Li, S. Wang, X. Gao, Y. Chen, W. Zhao, H. Chen, J. Huang, and W. Chu, AliMe chat: A sequence to sequence and rerank based chatbot engine (Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada), 2017, pp. 498-503. https://doi.org/10.18653/v1/P17-2079 DOI
17	Y. Song, R. Yan, C. T. Li, J. Y. Nie, M. Zhang, and D. Zhao, An ensemble of retrieval-based and generation-based human-computer conversation systems (Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence Main track), 2018, pp. 4382-4388. https://doi.org/10.24963/ijcai.2018/609 DOI
18	H. Perkins and Y. Yang, Dialog intent induction with deep multi-view clustering (Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China), 2019, pp. 4014-4023. https://doi.org/10.18653/v1/D19-1413 DOI
19	R. Yan, Y. Song, and H. Wu, Learning to respond with deep neural networks for retrieval-based human-computer conversation system (SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, Pisa, Italy), 2016, pp. 55-64. https://doi.org/10.1145/2911451.2911542 DOI
20	D. Kingma and J. Ba, Adam: A method for stochastic optimization, arXive preprint, ICLR, 2015. https://doi.org/10.48550/arXiv.1412.6980 DOI
21	K. Cho, B. Van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, Learning phrase representations using RNN encoder-decoder for statistical machine translation (Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar), 2014. https://doi.org/10.3115/v1/D14-1179 DOI
22	Y. Song, R. Yan, X. Li, D. Zhao, and M. Zhang, Two are better than one: An ensemble of retrieval-and generation-based dialog systems, arXiv preprint, 2016. https://doi.org/10.48550/arXiv.1610.07149 DOI
23	J. Park, Koelectra: Pretrained electra model for Korean, 2020. https://github.com/monologg/KoELECTRA
24	R. C. Gunasekara, D. Nahamoo, L. C. Polymenakos, D. E. Ciaurri, J. Ganhotra, and K. P. Fadnis, Quantized dialog-A general approach for conversational systems, Comput. Speech Lang. 54 (2019), 17-30. https://doi.org/10.1016/j.csl.2018.06.003 DOI
25	C. Dyer, Notes on noise contrastive estimation and negative sampling, arXive preprint, 2014. https://doi.org/10.48550/arXiv.1410.8251 DOI
26	C. Allen and T. Hospedales, Analogies explained: Towards understanding word embeddings, arXive preprint, 2019, pp. 223-231. https://doi.org/10.48550/arXiv.1901.09813 DOI
27	R. Sennrich, B. Haddow, and A. Birch, Neural machine translation of rare words with subword units (Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany), 2016, pp. 1715-1725. https://doi.org/10.18653/v1/P16-1162 DOI
28	H. Chen, X. Liu, D. Yin, and J. Tang, A survey on dialogue systems: Recent advances and new frontiers, SIGKDD Explor. Newsl. 19 (2017), no. 2, 25-35. https://doi.org/10.1145/3166054.3166058 DOI
29	A. Bordes, Y. Boureau, and J. Weston, Learning end-to-end goal-oriented dialog (Proceedings of ICLR), 2017. https://openreview.net/forum?id=S1Bb3D5gg
30	T. Kenter, A. Borisov, and M. de Rijke, Siamese CBOW: Optimizing word embeddings for sentence representations (Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany), 2016, pp. 941-951. https://doi.org/10.18653/v1/P16-1089 DOI
31	S. Sasaki, J. Suzuki, and K. Inui, Subword-based compact reconstruction of word embeddings (Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA), 2019, pp. 3498-3508. https://doi.org/10.18653/v1/N19-1353 DOI
32	J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, Bert: Pretraining of deep bidirectional transformers for language understanding, arXiv preprint, 2018. https://doi.org/10.48550/arXiv.1810.0480
33	R. Kiros, Y. Zhu, R. R. Salakhutdinov, R. Zemel, R. Urtasun, A. Torralba, and S. Fidler, Skip-thought vectors (NIPS'15: Proceedings of the 28th International Conference on Neural Information Processing Systems, Cambridge, MA, USA), Dec. 2015, pp. 3294-3302.
34	Y. Matsui, K. Ogaki, T. Yamasaki, and K. Aizawa, PQk-means: Billion-scale clustering for product-quantized codes (MM '17: Proceedings of the 25th ACM international conference on Multimedia, Mountain View, CA, USA), 2017, pp. 1725-1733. https://doi.org/10.1145/3123266.3123430 DOI
35	B. Athiwaratkun, A. Wilson, and A. Anandkumar, Probabilistic fasttext for multi-sense word embeddings (Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia), 2018, pp. 1-11. https://doi.org/10.18653/v1/P18-1001 DOI
36	S. Park, J. Byun, S. Baek, Y. Cho, and A. Oh, Subword-level word vector representations for Korean (Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia), 2018, pp. 2429-2438. https://doi.org/10.18653/v1/P18-1226 DOI
37	Y. Wu, W. Wu, C. Xing, M. Zhou, and Z. Li, Sequential matching network: A new architecture for multi-turn response selection in retrieval-based chatbots (Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada), 2017, pp. 496-505. https://doi.org/10.18653/v1/P17-1046 DOI
38	Z. Ji, Z. Lu, and H. Li, An information retrieval approach to short text conversation, arXiv preprint, 2014. https://doi.org/10.48550/arXiv.1408.6988 DOI
39	T. Mikolov, K. Chen, G. Corrado and J. Dean, Efficient estimation of word representations in vector space, arXive preprint, 2013. https://doi.org/10.48550/arXiv.1301.3781 DOI
40	C. Xing, Y. Wu, W. Wu, Y. Huang, and M. Zhou, Hierarchical recurrent attention network for response generation (Proceedings of AAAI), 2018, pp. 5610-5617. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/viewPaper/16510
41	E. Chung, H. J. Jeon, S. J. Lee, and J. G. Park, Korean phoneme sequence based word embedding (Proceedings of HCLT), 2017, pp. 225-227. http://www.koreascience.or.kr/article/CFKO201731951960129.page
42	T. H. Wen, D. Vandyke, N. Mrksic, M. Gasic, L. M. Rojas-Barahona, P. H. Su, S. Ultes, and S. Young, A network-based endto-end trainable task-oriented dialogue system (Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain), 2017, pp. 438-449.
43	S. Kumar and Y. Tsvetkov, Von mises-fisher loss for training sequence to sequence models with continuous outputs (The Seventh International Conference on Learning Representations, New Orleans, USA), May 2019. https://openreview.net/forum?id=rJlDnoA5Y7