Browse > Article
http://dx.doi.org/10.5392/JKCA.2016.16.10.687

Semantic Extention Search for Documents Using the Word2vec  

Kim, Woo-ju (연세대학교 정보산업공학과)
Kim, Dong-he (한국철도기술연구원)
Jang, Hee-won (연세대학교 정보산업공학과)
Publication Information
Abstract
Conventional way to search documents is keyword-based queries using vector space model, like tf-idf. Searching process of documents which is based on keywords can make some problems. it cannot recogize the difference of lexically different but semantically same words. This paper studies a scheme of document search based on document queries. In particular, it uses centrality vectors, instead of tf-idf vectors, to represent query documents, combined with the Word2vec method to capture the semantic similarity in contained words. This scheme improves the performance of document search and provides a way to find documents not only lexically, but semantically close to a query document.
Keywords
Semantic Search; Document Feature Vector; Vector Space Model; Word2vec;
Citations & Related Records
연도 인용수 순위
  • Reference
1 T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Distributed representations of words and phrases and theier compositionality," Advances in neural information processing systems, 2013.
2 Yoshua Bengio, New distributed probabilistic language models. Dept. IRO, University de Montreal, Montreal, QC, Canada, Tech. Rep, 1215, 2002.
3 Yoshua Bengio and Samy Bengio, "Modeling high-dimensional discrete data with multi-layer neural networks," In NIPS, Vol.99, pp.400-406, 1999.
4 Yoshua Bengio, Rejean Ducharme, Pascal Vincent, and Christian Janvin, "A neural probabilistic language model," The Journal of Machine Learning Research, Vol.3, pp.1137-1155, 2003.
5 Yoshua Bengio and Jean-Sebastien Senecal, et al. Quick training of probabilistic neural nets by importance sampling, In AISTATS Conference, 2003.
6 Gerard Salton, Anita Wong, and Chung-Shu Yang, "A vector space model for automatic indexing," Communication of the ACM, Vol.18, No.11, pp.613-620, 1975.   DOI
7 David Dubin, The most inuential paper gerard salton never wrote, 2004.
8 Ronan Collobert and Jason Weston, A unied architecture for natural language processing: Deep neural networks with multitask learning, In Proceedings of the 25th international conference on Machine learning, pp.160-167, ACM, 2008.
9 S. Brin and L. Page, "The Anatomy of a Large-scale Hypertextual Web Search Engine," Computer Networks and ISDN Systems, Vol.33, pp.107-117, 1998.
10 T. Mikolov, K. Chen, G. Corrado, and J, Dean "Efficient estimation of word representations in vector space," arXiv preprint arXiv:1301.3781, 2013.