DOI QR코드

DOI QR Code

Input Dimension Reduction based on Continuous Word Vector for Deep Neural Network Language Model

Deep Neural Network 언어모델을 위한 Continuous Word Vector 기반의 입력 차원 감소

  • Received : 2015.08.25
  • Accepted : 2015.11.23
  • Published : 2015.12.31

Abstract

In this paper, we investigate an input dimension reduction method using continuous word vector in deep neural network language model. In the proposed method, continuous word vectors were generated by using Google's Word2Vec from a large training corpus to satisfy distributional hypothesis. 1-of-${\left|V\right|}$ coding discrete word vectors were replaced with their corresponding continuous word vectors. In our implementation, the input dimension was successfully reduced from 20,000 to 600 when a tri-gram language model is used with a vocabulary of 20,000 words. The total amount of time in training was reduced from 30 days to 14 days for Wall Street Journal training corpus (corpus length: 37M words).

Keywords

References

  1. Bengio, Y., Ducharme, R., Vincent, P. and Jauvin, C. (2003). A neural probabilistic language model, Journal of Machine Learning Research, Vol. 3, 1137-1155.
  2. Bengio, Y. (2009). Learning deep architectures for AI, Journal of Foundations and Trends in Machine Learning, Vol. 2, No. 1, 1-127. https://doi.org/10.1561/2200000006
  3. Schwenk, H. & Gauvain, J. (2005). Training neural network language models on very large corpora, in Proc. Empirical Methods in Natural Language Processing, 201-208.
  4. Arisoy, E., Sainath, T., Kingsbury, B. and Ramabhadran, B. (2012). Deep neural network language models, in Proc. NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT, 20-28.
  5. Turney, P. & Pantel, P. (2010) From frequency to meaning: vector space models of semantics, Journal of Artificial Intelligence Research, Vol. 37, No. 1, 141-188. https://doi.org/10.1613/jair.2934
  6. Schutze, H. & Pedersen, J. (1995). Information retrieval based on word sense, in Proc. Symposium on Document Analysis and Information Retrieval, 161-175.
  7. Rubenstein, H. & Goodenough, J. (1965) Contextual correlates of synonymy, Communications of the ACM, Vol. 8, No. 10, 627-633. https://doi.org/10.1145/365628.365657
  8. Bruni, E., Boleda, G., Baroni, M. and Tran, N. (2012). Distributional semantics in technicolor, in Proc. 50th Annual Meeting of the Associations for Computational Linguistics, 136-145.
  9. Mikolov, T. (2013). Word2Vec, https://code.google.com/p/word2vec.
  10. Faruqui, M. & Dyer, C. (2014). Community evaluation and exchange of word vectors at wordvectors.org, in Proc. Associations for Computational Linguistics, 1-6.
  11. Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G. and Ruppin, E. (2001). Placing search in context: the concept revisited, in Proc. The Tenth International World Wide Web Conference, 406-414.
  12. Bruni, E., Boleda, G., Baroni, M. and Tran, N. (2012). Distributional semantics in technicolor, in Proc. 50th Annual Meeting of the Associations for Computational Linguistics, 136-145.
  13. Luong, M., Socher, R. and Manning, C. (2013). Better word representations with recursive neural networks for morphology, in Proc. Computational Natural Language Learning, 1-10.