Browse > Article
http://dx.doi.org/10.4218/etrij.2020-0245

Sentence model based subword embeddings for a dialog system  

Chung, Euisok (Integrated Intelligence Research Section, Electronics and Telecommunications Research Institute)
Kim, Hyun Woo (Integrated Intelligence Research Section, Electronics and Telecommunications Research Institute)
Song, Hwa Jeon (Integrated Intelligence Research Section, Electronics and Telecommunications Research Institute)
Publication Information
ETRI Journal / v.44, no.4, 2022 , pp. 599-612 More about this Journal
Abstract
This study focuses on improving a word embedding model to enhance the performance of downstream tasks, such as those of dialog systems. To improve traditional word embedding models, such as skip-gram, it is critical to refine the word features and expand the context model. In this paper, we approach the word model from the perspective of subword embedding and attempt to extend the context model by integrating various sentence models. Our proposed sentence model is a subword-based skip-thought model that integrates self-attention and relative position encoding techniques. We also propose a clustering-based dialog model for downstream task verification and evaluate its relationship with the sentence-model-based subword embedding technique. The proposed subword embedding method produces better results than previous methods in evaluating word and sentence similarity. In addition, the downstream task verification, a clustering-based dialog system, demonstrates an improvement of up to 4.86% over the results of FastText in previous research.
Keywords
dialog; embedding; sentence model; subword;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, Language models are unsupervised multitask learners, OpenAI Blog 1 (2019), no. 8, 9. https://www.techbooky.com/wp-content/uploads/2019/02/BetterLanguage-Models-and-Their-Implications.pdf
2 Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le, XLNet: Generalized autoregressive pretraining for language understanding, arXiv preprint, 2019. https://doi.org/10.48550/arXiv.1906.08237
3 N. Pappas and J. Henderson, Deep residual output layers for neural language generation, arXiv preprint, ICML, 2019, pp. 5000-5011. https://doi.org/10.48550/arXiv.1905.05513
4 T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, Distributed representations of words and phrases and their compositionality (NIPS'13: Proceedings of the 26th International Conference on Neural Information Processing Systems, Red Hook, NY, USA), Dec. 2013, pp. 3111-3119.
5 P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, Enriching word vectors with subword information, TACL 5 (2017), 135-146. https://www.aclweb.org/anthology/Q17-1010   DOI
6 A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, Attention is all you need (NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA), Dec. 2017, pp. 5998-6008.
7 H. Cuayahuitl, D. Lee, S. Ryu, S. Choi, I. Hwang, and J. Kim, Deep reinforcement learning for chatbots using clustered actions and human-likeness rewards, arXiv preprint, IJCNN, 2019, pp. 1-8. https://doi.org/10.48550/arXiv.1908.10331   DOI
8 V. Tshitoyan, J. Dagdelen, L. Weston, A. Dunn, Z. Rong, O. Kononova, K. A. Persson, G. Ceder, and A. Jain, Unsupervised word embeddings capture latent knowledge from materials science literature, Nature 571 (2019), no. 7763, 95-98. https://doi.org/10.1038/s41586-019-1335-8   DOI
9 O. Levy and Y. Goldberg, Dependency-based word embeddings (Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Maltimore, MD, USA), 2014, pp. 302-308. https://doi.org/10.3115/v1/P14-2050   DOI
10 F. Tian, H. Dai, J. Bian, B. Gao, R. Zhang, E. Chen, and T. Y. Liu, A probabilistic model for learning multi-prototype word embeddings (Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, Dublin, Ireland), 2014, pp. 151-160.
11 D. Kiela, C. Wang, and K. Cho, Dynamic meta-embeddings for improved sentence representations, (Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium), 2018, pp. 1466-1477. https://doi.org/10.18653/v1/d18-1176   DOI
12 B. Heinzerling and M. Strube, BPEmb: Tokenization-free pretrained subword embeddings in 275 languages (Proceedings of the Eleventh International Conference on Language Resources and Evaluation, Miyazaki, Japan), 2018.
13 S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Comput. 9 (1997), no. 8, 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735   DOI
14 P. Shaw, J. Uszkoreit, and A. Vaswani, Self-attention with relative position representations (Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, LA, USA), 2018, pp. 464-468. https://doi.org/10.18653/v1/N18-2074   DOI
15 L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, and E. Ruppin, Placing search in context: The concept revisited, ACM Trans. Inform. Syst. 20 (2002), no. 1, 116-131. https://doi.org/10.1145/503104.503110   DOI
16 M. Qiu, F. L. Li, S. Wang, X. Gao, Y. Chen, W. Zhao, H. Chen, J. Huang, and W. Chu, AliMe chat: A sequence to sequence and rerank based chatbot engine (Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada), 2017, pp. 498-503. https://doi.org/10.18653/v1/P17-2079   DOI
17 Y. Song, R. Yan, C. T. Li, J. Y. Nie, M. Zhang, and D. Zhao, An ensemble of retrieval-based and generation-based human-computer conversation systems (Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence Main track), 2018, pp. 4382-4388. https://doi.org/10.24963/ijcai.2018/609   DOI
18 H. Perkins and Y. Yang, Dialog intent induction with deep multi-view clustering (Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China), 2019, pp. 4014-4023. https://doi.org/10.18653/v1/D19-1413   DOI
19 R. Yan, Y. Song, and H. Wu, Learning to respond with deep neural networks for retrieval-based human-computer conversation system (SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, Pisa, Italy), 2016, pp. 55-64. https://doi.org/10.1145/2911451.2911542   DOI
20 D. Kingma and J. Ba, Adam: A method for stochastic optimization, arXive preprint, ICLR, 2015. https://doi.org/10.48550/arXiv.1412.6980   DOI
21 K. Cho, B. Van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, Learning phrase representations using RNN encoder-decoder for statistical machine translation (Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar), 2014. https://doi.org/10.3115/v1/D14-1179   DOI
22 Y. Song, R. Yan, X. Li, D. Zhao, and M. Zhang, Two are better than one: An ensemble of retrieval-and generation-based dialog systems, arXiv preprint, 2016. https://doi.org/10.48550/arXiv.1610.07149   DOI
23 J. Park, Koelectra: Pretrained electra model for Korean, 2020. https://github.com/monologg/KoELECTRA
24 R. C. Gunasekara, D. Nahamoo, L. C. Polymenakos, D. E. Ciaurri, J. Ganhotra, and K. P. Fadnis, Quantized dialog-A general approach for conversational systems, Comput. Speech Lang. 54 (2019), 17-30. https://doi.org/10.1016/j.csl.2018.06.003   DOI
25 C. Dyer, Notes on noise contrastive estimation and negative sampling, arXive preprint, 2014. https://doi.org/10.48550/arXiv.1410.8251   DOI
26 C. Allen and T. Hospedales, Analogies explained: Towards understanding word embeddings, arXive preprint, 2019, pp. 223-231. https://doi.org/10.48550/arXiv.1901.09813   DOI
27 R. Sennrich, B. Haddow, and A. Birch, Neural machine translation of rare words with subword units (Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany), 2016, pp. 1715-1725. https://doi.org/10.18653/v1/P16-1162   DOI
28 H. Chen, X. Liu, D. Yin, and J. Tang, A survey on dialogue systems: Recent advances and new frontiers, SIGKDD Explor. Newsl. 19 (2017), no. 2, 25-35. https://doi.org/10.1145/3166054.3166058   DOI
29 A. Bordes, Y. Boureau, and J. Weston, Learning end-to-end goal-oriented dialog (Proceedings of ICLR), 2017. https://openreview.net/forum?id=S1Bb3D5gg
30 T. Kenter, A. Borisov, and M. de Rijke, Siamese CBOW: Optimizing word embeddings for sentence representations (Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany), 2016, pp. 941-951. https://doi.org/10.18653/v1/P16-1089   DOI
31 S. Sasaki, J. Suzuki, and K. Inui, Subword-based compact reconstruction of word embeddings (Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA), 2019, pp. 3498-3508. https://doi.org/10.18653/v1/N19-1353   DOI
32 J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, Bert: Pretraining of deep bidirectional transformers for language understanding, arXiv preprint, 2018. https://doi.org/10.48550/arXiv.1810.0480
33 R. Kiros, Y. Zhu, R. R. Salakhutdinov, R. Zemel, R. Urtasun, A. Torralba, and S. Fidler, Skip-thought vectors (NIPS'15: Proceedings of the 28th International Conference on Neural Information Processing Systems, Cambridge, MA, USA), Dec. 2015, pp. 3294-3302.
34 Y. Matsui, K. Ogaki, T. Yamasaki, and K. Aizawa, PQk-means: Billion-scale clustering for product-quantized codes (MM '17: Proceedings of the 25th ACM international conference on Multimedia, Mountain View, CA, USA), 2017, pp. 1725-1733. https://doi.org/10.1145/3123266.3123430   DOI
35 B. Athiwaratkun, A. Wilson, and A. Anandkumar, Probabilistic fasttext for multi-sense word embeddings (Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia), 2018, pp. 1-11. https://doi.org/10.18653/v1/P18-1001   DOI
36 S. Park, J. Byun, S. Baek, Y. Cho, and A. Oh, Subword-level word vector representations for Korean (Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia), 2018, pp. 2429-2438. https://doi.org/10.18653/v1/P18-1226   DOI
37 Y. Wu, W. Wu, C. Xing, M. Zhou, and Z. Li, Sequential matching network: A new architecture for multi-turn response selection in retrieval-based chatbots (Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada), 2017, pp. 496-505. https://doi.org/10.18653/v1/P17-1046   DOI
38 Z. Ji, Z. Lu, and H. Li, An information retrieval approach to short text conversation, arXiv preprint, 2014. https://doi.org/10.48550/arXiv.1408.6988   DOI
39 T. Mikolov, K. Chen, G. Corrado and J. Dean, Efficient estimation of word representations in vector space, arXive preprint, 2013. https://doi.org/10.48550/arXiv.1301.3781   DOI
40 C. Xing, Y. Wu, W. Wu, Y. Huang, and M. Zhou, Hierarchical recurrent attention network for response generation (Proceedings of AAAI), 2018, pp. 5610-5617. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/viewPaper/16510
41 E. Chung, H. J. Jeon, S. J. Lee, and J. G. Park, Korean phoneme sequence based word embedding (Proceedings of HCLT), 2017, pp. 225-227. http://www.koreascience.or.kr/article/CFKO201731951960129.page
42 T. H. Wen, D. Vandyke, N. Mrksic, M. Gasic, L. M. Rojas-Barahona, P. H. Su, S. Ultes, and S. Young, A network-based endto-end trainable task-oriented dialogue system (Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain), 2017, pp. 438-449.
43 S. Kumar and Y. Tsvetkov, Von mises-fisher loss for training sequence to sequence models with continuous outputs (The Seventh International Conference on Learning Representations, New Orleans, USA), May 2019. https://openreview.net/forum?id=rJlDnoA5Y7