Sentence model based subword embeddings for a dialog system

Chung, Euisok;Kim, Hyun Woo;Song, Hwa Jeon;

doi:10.4218/etrij.2020-0245

ETRI Journal

Volume 44 Issue 4
/
Pages.599-612
/
2022
/
1225-6463(pISSN)
/
2233-7326(eISSN)

Electronics and Telecommunications Research Institute (한국전자통신연구원)

DOI QR Code

Sentence model based subword embeddings for a dialog system

Chung, Euisok (Integrated Intelligence Research Section, Electronics and Telecommunications Research Institute) ;
Kim, Hyun Woo (Integrated Intelligence Research Section, Electronics and Telecommunications Research Institute) ;
Song, Hwa Jeon (Integrated Intelligence Research Section, Electronics and Telecommunications Research Institute)

Received : 2020.06.15
Accepted : 2022.03.15
Published : 2022.08.10

https://doi.org/10.4218/etrij.2020-0245 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

This study focuses on improving a word embedding model to enhance the performance of downstream tasks, such as those of dialog systems. To improve traditional word embedding models, such as skip-gram, it is critical to refine the word features and expand the context model. In this paper, we approach the word model from the perspective of subword embedding and attempt to extend the context model by integrating various sentence models. Our proposed sentence model is a subword-based skip-thought model that integrates self-attention and relative position encoding techniques. We also propose a clustering-based dialog model for downstream task verification and evaluate its relationship with the sentence-model-based subword embedding technique. The proposed subword embedding method produces better results than previous methods in evaluating word and sentence similarity. In addition, the downstream task verification, a clustering-based dialog system, demonstrates an improvement of up to 4.86% over the results of FastText in previous research.

Keywords

Acknowledgement

This work was supported by an Electronics and Telecommunications Research Institute (ETRI) grant funded by the Korean government [22ZS1100, Core Technology Research for Self-Improving Integrated Artificial Intelligence System].

References

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, Language models are unsupervised multitask learners, OpenAI Blog 1 (2019), no. 8, 9. https://www.techbooky.com/wp-content/uploads/2019/02/BetterLanguage-Models-and-Their-Implications.pdf
J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, Bert: Pretraining of deep bidirectional transformers for language understanding, arXiv preprint, 2018. https://doi.org/10.48550/arXiv.1810.0480
Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le, XLNet: Generalized autoregressive pretraining for language understanding, arXiv preprint, 2019. https://doi.org/10.48550/arXiv.1906.08237
N. Pappas and J. Henderson, Deep residual output layers for neural language generation, arXiv preprint, ICML, 2019, pp. 5000-5011. https://doi.org/10.48550/arXiv.1905.05513
S. Kumar and Y. Tsvetkov, Von mises-fisher loss for training sequence to sequence models with continuous outputs (The Seventh International Conference on Learning Representations, New Orleans, USA), May 2019. https://openreview.net/forum?id=rJlDnoA5Y7
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, Distributed representations of words and phrases and their compositionality (NIPS'13: Proceedings of the 26th International Conference on Neural Information Processing Systems, Red Hook, NY, USA), Dec. 2013, pp. 3111-3119.
P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, Enriching word vectors with subword information, TACL 5 (2017), 135-146. https://www.aclweb.org/anthology/Q17-1010 https://doi.org/10.1162/tacl_a_00051
R. Kiros, Y. Zhu, R. R. Salakhutdinov, R. Zemel, R. Urtasun, A. Torralba, and S. Fidler, Skip-thought vectors (NIPS'15: Proceedings of the 28th International Conference on Neural Information Processing Systems, Cambridge, MA, USA), Dec. 2015, pp. 3294-3302.
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, Attention is all you need (NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA), Dec. 2017, pp. 5998-6008.
P. Shaw, J. Uszkoreit, and A. Vaswani, Self-attention with relative position representations (Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, LA, USA), 2018, pp. 464-468. https://doi.org/10.18653/v1/N18-2074
Y. Matsui, K. Ogaki, T. Yamasaki, and K. Aizawa, PQk-means: Billion-scale clustering for product-quantized codes (MM '17: Proceedings of the 25th ACM international conference on Multimedia, Mountain View, CA, USA), 2017, pp. 1725-1733. https://doi.org/10.1145/3123266.3123430
V. Tshitoyan, J. Dagdelen, L. Weston, A. Dunn, Z. Rong, O. Kononova, K. A. Persson, G. Ceder, and A. Jain, Unsupervised word embeddings capture latent knowledge from materials science literature, Nature 571 (2019), no. 7763, 95-98. https://doi.org/10.1038/s41586-019-1335-8
O. Levy and Y. Goldberg, Dependency-based word embeddings (Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Maltimore, MD, USA), 2014, pp. 302-308. https://doi.org/10.3115/v1/P14-2050
B. Athiwaratkun, A. Wilson, and A. Anandkumar, Probabilistic fasttext for multi-sense word embeddings (Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia), 2018, pp. 1-11. https://doi.org/10.18653/v1/P18-1001
F. Tian, H. Dai, J. Bian, B. Gao, R. Zhang, E. Chen, and T. Y. Liu, A probabilistic model for learning multi-prototype word embeddings (Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, Dublin, Ireland), 2014, pp. 151-160.
D. Kiela, C. Wang, and K. Cho, Dynamic meta-embeddings for improved sentence representations, (Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium), 2018, pp. 1466-1477. https://doi.org/10.18653/v1/d18-1176
S. Park, J. Byun, S. Baek, Y. Cho, and A. Oh, Subword-level word vector representations for Korean (Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia), 2018, pp. 2429-2438. https://doi.org/10.18653/v1/P18-1226
S. Sasaki, J. Suzuki, and K. Inui, Subword-based compact reconstruction of word embeddings (Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA), 2019, pp. 3498-3508. https://doi.org/10.18653/v1/N19-1353
B. Heinzerling and M. Strube, BPEmb: Tokenization-free pretrained subword embeddings in 275 languages (Proceedings of the Eleventh International Conference on Language Resources and Evaluation, Miyazaki, Japan), 2018.
R. Sennrich, B. Haddow, and A. Birch, Neural machine translation of rare words with subword units (Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany), 2016, pp. 1715-1725. https://doi.org/10.18653/v1/P16-1162
S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Comput. 9 (1997), no. 8, 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735
T. Kenter, A. Borisov, and M. de Rijke, Siamese CBOW: Optimizing word embeddings for sentence representations (Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany), 2016, pp. 941-951. https://doi.org/10.18653/v1/P16-1089
E. Chung, H. J. Jeon, S. J. Lee, and J. G. Park, Korean phoneme sequence based word embedding (Proceedings of HCLT), 2017, pp. 225-227. http://www.koreascience.or.kr/article/CFKO201731951960129.page
H. Chen, X. Liu, D. Yin, and J. Tang, A survey on dialogue systems: Recent advances and new frontiers, SIGKDD Explor. Newsl. 19 (2017), no. 2, 25-35. https://doi.org/10.1145/3166054.3166058
A. Bordes, Y. Boureau, and J. Weston, Learning end-to-end goal-oriented dialog (Proceedings of ICLR), 2017. https://openreview.net/forum?id=S1Bb3D5gg
T. H. Wen, D. Vandyke, N. Mrksic, M. Gasic, L. M. Rojas-Barahona, P. H. Su, S. Ultes, and S. Young, A network-based endto-end trainable task-oriented dialogue system (Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain), 2017, pp. 438-449.
Y. Wu, W. Wu, C. Xing, M. Zhou, and Z. Li, Sequential matching network: A new architecture for multi-turn response selection in retrieval-based chatbots (Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada), 2017, pp. 496-505. https://doi.org/10.18653/v1/P17-1046
Z. Ji, Z. Lu, and H. Li, An information retrieval approach to short text conversation, arXiv preprint, 2014. https://doi.org/10.48550/arXiv.1408.6988
R. Yan, Y. Song, and H. Wu, Learning to respond with deep neural networks for retrieval-based human-computer conversation system (SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, Pisa, Italy), 2016, pp. 55-64. https://doi.org/10.1145/2911451.2911542
C. Xing, Y. Wu, W. Wu, Y. Huang, and M. Zhou, Hierarchical recurrent attention network for response generation (Proceedings of AAAI), 2018, pp. 5610-5617. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/viewPaper/16510
M. Qiu, F. L. Li, S. Wang, X. Gao, Y. Chen, W. Zhao, H. Chen, J. Huang, and W. Chu, AliMe chat: A sequence to sequence and rerank based chatbot engine (Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada), 2017, pp. 498-503. https://doi.org/10.18653/v1/P17-2079
Y. Song, R. Yan, X. Li, D. Zhao, and M. Zhang, Two are better than one: An ensemble of retrieval-and generation-based dialog systems, arXiv preprint, 2016. https://doi.org/10.48550/arXiv.1610.07149
Y. Song, R. Yan, C. T. Li, J. Y. Nie, M. Zhang, and D. Zhao, An ensemble of retrieval-based and generation-based human-computer conversation systems (Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence Main track), 2018, pp. 4382-4388. https://doi.org/10.24963/ijcai.2018/609
H. Cuayahuitl, D. Lee, S. Ryu, S. Choi, I. Hwang, and J. Kim, Deep reinforcement learning for chatbots using clustered actions and human-likeness rewards, arXiv preprint, IJCNN, 2019, pp. 1-8. https://doi.org/10.48550/arXiv.1908.10331
R. C. Gunasekara, D. Nahamoo, L. C. Polymenakos, D. E. Ciaurri, J. Ganhotra, and K. P. Fadnis, Quantized dialog-A general approach for conversational systems, Comput. Speech Lang. 54 (2019), 17-30. https://doi.org/10.1016/j.csl.2018.06.003
H. Perkins and Y. Yang, Dialog intent induction with deep multi-view clustering (Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China), 2019, pp. 4014-4023. https://doi.org/10.18653/v1/D19-1413
T. Mikolov, K. Chen, G. Corrado and J. Dean, Efficient estimation of word representations in vector space, arXive preprint, 2013. https://doi.org/10.48550/arXiv.1301.3781
C. Dyer, Notes on noise contrastive estimation and negative sampling, arXive preprint, 2014. https://doi.org/10.48550/arXiv.1410.8251
K. Cho, B. Van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, Learning phrase representations using RNN encoder-decoder for statistical machine translation (Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar), 2014. https://doi.org/10.3115/v1/D14-1179
L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, and E. Ruppin, Placing search in context: The concept revisited, ACM Trans. Inform. Syst. 20 (2002), no. 1, 116-131. https://doi.org/10.1145/503104.503110
C. Allen and T. Hospedales, Analogies explained: Towards understanding word embeddings, arXive preprint, 2019, pp. 223-231. https://doi.org/10.48550/arXiv.1901.09813
D. Kingma and J. Ba, Adam: A method for stochastic optimization, arXive preprint, ICLR, 2015. https://doi.org/10.48550/arXiv.1412.6980
J. Park, Koelectra: Pretrained electra model for Korean, 2020. https://github.com/monologg/KoELECTRA

ETRI Journal

Sentence model based subword embeddings for a dialog system

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)