Acknowledgement
This work was supported by an Electronics and Telecommunications Research Institute (ETRI) grant funded by the Korean government [22ZS1100, Core Technology Research for Self-Improving Integrated Artificial Intelligence System].
References
- A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, Language models are unsupervised multitask learners, OpenAI Blog 1 (2019), no. 8, 9. https://www.techbooky.com/wp-content/uploads/2019/02/BetterLanguage-Models-and-Their-Implications.pdf
- J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, Bert: Pretraining of deep bidirectional transformers for language understanding, arXiv preprint, 2018. https://doi.org/10.48550/arXiv.1810.0480
- Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le, XLNet: Generalized autoregressive pretraining for language understanding, arXiv preprint, 2019. https://doi.org/10.48550/arXiv.1906.08237
- N. Pappas and J. Henderson, Deep residual output layers for neural language generation, arXiv preprint, ICML, 2019, pp. 5000-5011. https://doi.org/10.48550/arXiv.1905.05513
- S. Kumar and Y. Tsvetkov, Von mises-fisher loss for training sequence to sequence models with continuous outputs (The Seventh International Conference on Learning Representations, New Orleans, USA), May 2019. https://openreview.net/forum?id=rJlDnoA5Y7
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, Distributed representations of words and phrases and their compositionality (NIPS'13: Proceedings of the 26th International Conference on Neural Information Processing Systems, Red Hook, NY, USA), Dec. 2013, pp. 3111-3119.
- P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, Enriching word vectors with subword information, TACL 5 (2017), 135-146. https://www.aclweb.org/anthology/Q17-1010 https://doi.org/10.1162/tacl_a_00051
- R. Kiros, Y. Zhu, R. R. Salakhutdinov, R. Zemel, R. Urtasun, A. Torralba, and S. Fidler, Skip-thought vectors (NIPS'15: Proceedings of the 28th International Conference on Neural Information Processing Systems, Cambridge, MA, USA), Dec. 2015, pp. 3294-3302.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, Attention is all you need (NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA), Dec. 2017, pp. 5998-6008.
- P. Shaw, J. Uszkoreit, and A. Vaswani, Self-attention with relative position representations (Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, LA, USA), 2018, pp. 464-468. https://doi.org/10.18653/v1/N18-2074
- Y. Matsui, K. Ogaki, T. Yamasaki, and K. Aizawa, PQk-means: Billion-scale clustering for product-quantized codes (MM '17: Proceedings of the 25th ACM international conference on Multimedia, Mountain View, CA, USA), 2017, pp. 1725-1733. https://doi.org/10.1145/3123266.3123430
- V. Tshitoyan, J. Dagdelen, L. Weston, A. Dunn, Z. Rong, O. Kononova, K. A. Persson, G. Ceder, and A. Jain, Unsupervised word embeddings capture latent knowledge from materials science literature, Nature 571 (2019), no. 7763, 95-98. https://doi.org/10.1038/s41586-019-1335-8
- O. Levy and Y. Goldberg, Dependency-based word embeddings (Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Maltimore, MD, USA), 2014, pp. 302-308. https://doi.org/10.3115/v1/P14-2050
- B. Athiwaratkun, A. Wilson, and A. Anandkumar, Probabilistic fasttext for multi-sense word embeddings (Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia), 2018, pp. 1-11. https://doi.org/10.18653/v1/P18-1001
- F. Tian, H. Dai, J. Bian, B. Gao, R. Zhang, E. Chen, and T. Y. Liu, A probabilistic model for learning multi-prototype word embeddings (Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, Dublin, Ireland), 2014, pp. 151-160.
- D. Kiela, C. Wang, and K. Cho, Dynamic meta-embeddings for improved sentence representations, (Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium), 2018, pp. 1466-1477. https://doi.org/10.18653/v1/d18-1176
- S. Park, J. Byun, S. Baek, Y. Cho, and A. Oh, Subword-level word vector representations for Korean (Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia), 2018, pp. 2429-2438. https://doi.org/10.18653/v1/P18-1226
- S. Sasaki, J. Suzuki, and K. Inui, Subword-based compact reconstruction of word embeddings (Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA), 2019, pp. 3498-3508. https://doi.org/10.18653/v1/N19-1353
- B. Heinzerling and M. Strube, BPEmb: Tokenization-free pretrained subword embeddings in 275 languages (Proceedings of the Eleventh International Conference on Language Resources and Evaluation, Miyazaki, Japan), 2018.
- R. Sennrich, B. Haddow, and A. Birch, Neural machine translation of rare words with subword units (Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany), 2016, pp. 1715-1725. https://doi.org/10.18653/v1/P16-1162
- S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Comput. 9 (1997), no. 8, 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735
- T. Kenter, A. Borisov, and M. de Rijke, Siamese CBOW: Optimizing word embeddings for sentence representations (Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany), 2016, pp. 941-951. https://doi.org/10.18653/v1/P16-1089
- E. Chung, H. J. Jeon, S. J. Lee, and J. G. Park, Korean phoneme sequence based word embedding (Proceedings of HCLT), 2017, pp. 225-227. http://www.koreascience.or.kr/article/CFKO201731951960129.page
- H. Chen, X. Liu, D. Yin, and J. Tang, A survey on dialogue systems: Recent advances and new frontiers, SIGKDD Explor. Newsl. 19 (2017), no. 2, 25-35. https://doi.org/10.1145/3166054.3166058
- A. Bordes, Y. Boureau, and J. Weston, Learning end-to-end goal-oriented dialog (Proceedings of ICLR), 2017. https://openreview.net/forum?id=S1Bb3D5gg
- T. H. Wen, D. Vandyke, N. Mrksic, M. Gasic, L. M. Rojas-Barahona, P. H. Su, S. Ultes, and S. Young, A network-based endto-end trainable task-oriented dialogue system (Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain), 2017, pp. 438-449.
- Y. Wu, W. Wu, C. Xing, M. Zhou, and Z. Li, Sequential matching network: A new architecture for multi-turn response selection in retrieval-based chatbots (Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada), 2017, pp. 496-505. https://doi.org/10.18653/v1/P17-1046
- Z. Ji, Z. Lu, and H. Li, An information retrieval approach to short text conversation, arXiv preprint, 2014. https://doi.org/10.48550/arXiv.1408.6988
- R. Yan, Y. Song, and H. Wu, Learning to respond with deep neural networks for retrieval-based human-computer conversation system (SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, Pisa, Italy), 2016, pp. 55-64. https://doi.org/10.1145/2911451.2911542
- C. Xing, Y. Wu, W. Wu, Y. Huang, and M. Zhou, Hierarchical recurrent attention network for response generation (Proceedings of AAAI), 2018, pp. 5610-5617. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/viewPaper/16510
- M. Qiu, F. L. Li, S. Wang, X. Gao, Y. Chen, W. Zhao, H. Chen, J. Huang, and W. Chu, AliMe chat: A sequence to sequence and rerank based chatbot engine (Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada), 2017, pp. 498-503. https://doi.org/10.18653/v1/P17-2079
- Y. Song, R. Yan, X. Li, D. Zhao, and M. Zhang, Two are better than one: An ensemble of retrieval-and generation-based dialog systems, arXiv preprint, 2016. https://doi.org/10.48550/arXiv.1610.07149
- Y. Song, R. Yan, C. T. Li, J. Y. Nie, M. Zhang, and D. Zhao, An ensemble of retrieval-based and generation-based human-computer conversation systems (Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence Main track), 2018, pp. 4382-4388. https://doi.org/10.24963/ijcai.2018/609
- H. Cuayahuitl, D. Lee, S. Ryu, S. Choi, I. Hwang, and J. Kim, Deep reinforcement learning for chatbots using clustered actions and human-likeness rewards, arXiv preprint, IJCNN, 2019, pp. 1-8. https://doi.org/10.48550/arXiv.1908.10331
- R. C. Gunasekara, D. Nahamoo, L. C. Polymenakos, D. E. Ciaurri, J. Ganhotra, and K. P. Fadnis, Quantized dialog-A general approach for conversational systems, Comput. Speech Lang. 54 (2019), 17-30. https://doi.org/10.1016/j.csl.2018.06.003
- H. Perkins and Y. Yang, Dialog intent induction with deep multi-view clustering (Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China), 2019, pp. 4014-4023. https://doi.org/10.18653/v1/D19-1413
- T. Mikolov, K. Chen, G. Corrado and J. Dean, Efficient estimation of word representations in vector space, arXive preprint, 2013. https://doi.org/10.48550/arXiv.1301.3781
- C. Dyer, Notes on noise contrastive estimation and negative sampling, arXive preprint, 2014. https://doi.org/10.48550/arXiv.1410.8251
- K. Cho, B. Van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, Learning phrase representations using RNN encoder-decoder for statistical machine translation (Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar), 2014. https://doi.org/10.3115/v1/D14-1179
- L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, and E. Ruppin, Placing search in context: The concept revisited, ACM Trans. Inform. Syst. 20 (2002), no. 1, 116-131. https://doi.org/10.1145/503104.503110
- C. Allen and T. Hospedales, Analogies explained: Towards understanding word embeddings, arXive preprint, 2019, pp. 223-231. https://doi.org/10.48550/arXiv.1901.09813
- D. Kingma and J. Ba, Adam: A method for stochastic optimization, arXive preprint, ICLR, 2015. https://doi.org/10.48550/arXiv.1412.6980
- J. Park, Koelectra: Pretrained electra model for Korean, 2020. https://github.com/monologg/KoELECTRA