Browse > Article
http://dx.doi.org/10.13064/KSSS.2015.7.4.003

Input Dimension Reduction based on Continuous Word Vector for Deep Neural Network Language Model  

Kim, Kwang-Ho (서강대학교)
Lee, Donghyun (서강대학교)
Lim, Minkyu (서강대학교)
Kim, Ji-Hwan (서강대학교)
Publication Information
Phonetics and Speech Sciences / v.7, no.4, 2015 , pp. 3-8 More about this Journal
Abstract
In this paper, we investigate an input dimension reduction method using continuous word vector in deep neural network language model. In the proposed method, continuous word vectors were generated by using Google's Word2Vec from a large training corpus to satisfy distributional hypothesis. 1-of-${\left|V\right|}$ coding discrete word vectors were replaced with their corresponding continuous word vectors. In our implementation, the input dimension was successfully reduced from 20,000 to 600 when a tri-gram language model is used with a vocabulary of 20,000 words. The total amount of time in training was reduced from 30 days to 14 days for Wall Street Journal training corpus (corpus length: 37M words).
Keywords
deep neural network; language model; continuous word vector; input dimension reduction;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Bengio, Y., Ducharme, R., Vincent, P. and Jauvin, C. (2003). A neural probabilistic language model, Journal of Machine Learning Research, Vol. 3, 1137-1155.
2 Bengio, Y. (2009). Learning deep architectures for AI, Journal of Foundations and Trends in Machine Learning, Vol. 2, No. 1, 1-127.   DOI
3 Schwenk, H. & Gauvain, J. (2005). Training neural network language models on very large corpora, in Proc. Empirical Methods in Natural Language Processing, 201-208.
4 Arisoy, E., Sainath, T., Kingsbury, B. and Ramabhadran, B. (2012). Deep neural network language models, in Proc. NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT, 20-28.
5 Turney, P. & Pantel, P. (2010) From frequency to meaning: vector space models of semantics, Journal of Artificial Intelligence Research, Vol. 37, No. 1, 141-188.   DOI
6 Schutze, H. & Pedersen, J. (1995). Information retrieval based on word sense, in Proc. Symposium on Document Analysis and Information Retrieval, 161-175.
7 Rubenstein, H. & Goodenough, J. (1965) Contextual correlates of synonymy, Communications of the ACM, Vol. 8, No. 10, 627-633.   DOI
8 Bruni, E., Boleda, G., Baroni, M. and Tran, N. (2012). Distributional semantics in technicolor, in Proc. 50th Annual Meeting of the Associations for Computational Linguistics, 136-145.
9 Mikolov, T. (2013). Word2Vec, https://code.google.com/p/word2vec.
10 Faruqui, M. & Dyer, C. (2014). Community evaluation and exchange of word vectors at wordvectors.org, in Proc. Associations for Computational Linguistics, 1-6.
11 Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G. and Ruppin, E. (2001). Placing search in context: the concept revisited, in Proc. The Tenth International World Wide Web Conference, 406-414.
12 Bruni, E., Boleda, G., Baroni, M. and Tran, N. (2012). Distributional semantics in technicolor, in Proc. 50th Annual Meeting of the Associations for Computational Linguistics, 136-145.
13 Luong, M., Socher, R. and Manning, C. (2013). Better word representations with recursive neural networks for morphology, in Proc. Computational Natural Language Learning, 1-10.