[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.5392/JKCA.2016.16.10.687

Semantic Extention Search for Documents Using the Word2vec

Kim, Woo-ju (연세대학교 정보산업공학과)
Kim, Dong-he (한국철도기술연구원)
Jang, Hee-won (연세대학교 정보산업공학과)

Publication Information

The Journal of the Korea Contents Association / v.16, no.10, 2016 , pp. 687-692 More about this Journal

Abstract

Conventional way to search documents is keyword-based queries using vector space model, like tf-idf. Searching process of documents which is based on keywords can make some problems. it cannot recogize the difference of lexically different but semantically same words. This paper studies a scheme of document search based on document queries. In particular, it uses centrality vectors, instead of tf-idf vectors, to represent query documents, combined with the Word2vec method to capture the semantic similarity in contained words. This scheme improves the performance of document search and provides a way to find documents not only lexically, but semantically close to a query document.

Keywords

Semantic Search; Document Feature Vector; Vector Space Model; Word2vec;

Citations & Related Records

Reference

1	T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Distributed representations of words and phrases and theier compositionality," Advances in neural information processing systems, 2013.
2	Yoshua Bengio, New distributed probabilistic language models. Dept. IRO, University de Montreal, Montreal, QC, Canada, Tech. Rep, 1215, 2002.
3	Yoshua Bengio and Samy Bengio, "Modeling high-dimensional discrete data with multi-layer neural networks," In NIPS, Vol.99, pp.400-406, 1999.
4	Yoshua Bengio, Rejean Ducharme, Pascal Vincent, and Christian Janvin, "A neural probabilistic language model," The Journal of Machine Learning Research, Vol.3, pp.1137-1155, 2003.
5	Yoshua Bengio and Jean-Sebastien Senecal, et al. Quick training of probabilistic neural nets by importance sampling, In AISTATS Conference, 2003.
6	Gerard Salton, Anita Wong, and Chung-Shu Yang, "A vector space model for automatic indexing," Communication of the ACM, Vol.18, No.11, pp.613-620, 1975. DOI
7	David Dubin, The most inuential paper gerard salton never wrote, 2004.
8	Ronan Collobert and Jason Weston, A unied architecture for natural language processing: Deep neural networks with multitask learning, In Proceedings of the 25th international conference on Machine learning, pp.160-167, ACM, 2008.
9	S. Brin and L. Page, "The Anatomy of a Large-scale Hypertextual Web Search Engine," Computer Networks and ISDN Systems, Vol.33, pp.107-117, 1998.
10	T. Mikolov, K. Chen, G. Corrado, and J, Dean "Efficient estimation of word representations in vector space," arXiv preprint arXiv:1301.3781, 2013.

KSCI

Semantic Extention Search for Documents Using the Word2vec Word2vec을 활용한 문서의 의미 확장 검색방법

Semantic Extention Search for Documents Using the Word2vec