[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.6109/jkiice.2018.22.10.1300

The Sentence Similarity Measure Using Deep-Learning and Char2Vec

Lim, Geun-Young (Department of Information Security, Daejeon University)
Cho, Young-Bok (Department of Information Security, Daejeon University)

Publication Information

Journal of the Korea Institute of Information and Communication Engineering / v.22, no.10, 2018 , pp. 1300-1306 More about this Journal

Abstract

The purpose of this study is to see possibility of Char2Vec as alternative of Word2Vec that most famous word embedding model in Sentence Similarity Measure Problem by Deep-Learning. In experiment, we used the Siamese Ma-LSTM recurrent neural network architecture for measure similarity two random sentences. Siamese Ma-LSTM model was implemented with tensorflow. We train each model with 200 epoch on gpu environment and it took about 20 hours. Then we compared Word2Vec based model training result with Char2Vec based model training result. as a result, model of based with Char2Vec that initialized random weight record 75.1% validation dataset accuracy and model of based with Word2Vec that pretrained with 3 million words and phrase record 71.6% validation dataset accuracy. so Char2Vec is suitable alternate of Word2Vec to optimize high system memory requirements problem.

Keywords

Word2Vec; Char2Vec; Deep-learning; GRU; NLP;

Citations & Related Records

Times Cited By KSCI : 3 (Citation Analysis)

Reference
Cited By KSCI

1	S. J. Park, S. M. Choi, H. J. Lee, J. B. Kim, "Spatial analysis using R based Deep Learning," Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology, vol. 6, no. 4, pp. 1-8, April 2016.
2	J. M. Kim and J. H. Lee, "Text Document Classification Based on Recurrent Neural Network Using Word2vec," Journal of korean Institute of Intelligent System, vol. 27, no.6, pp. 560-565, Jun. 2017. DOI
3	P. Baudis, S. Stanko and J. Sedivy, "Joint Learning of Sentence Embeddings for Relevance and Entailment," in The Workshop on Representation Learning for NLP, Berlin, Germany, pp. 18-26, 2016.
4	J. Y. Kim and E. H. Park, "e-Learning Course Reviews Analysis based on Big Data Analytics," Journal of the Korea Institute of Information and Communication Engineering, Vol. 21, No. 2, pp. 423-428, Feb. 2017. DOI
5	J. M. Kim and J. H. Lee, "Text Document Classification Based on Recurrent Neural Network Using Word2vec," Journal of Korean Institute of Intelligent Systems, Vol. 27, No. 6, pp. 560-565, Dec. 2017. DOI
6	M. Jonas, and A. Thyagarajan. "Siamese Recurrent Architectures for Learning Sentence Similarity," in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Arizona, pp. 2786-2792, 2016.
7	Y. Kim, Y. Jernite, D. Sontag, and A. M. Rush, "Character-Aware Neural Language Models," in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence ,Arizona, pp. 2741-2749 , 2016.
8	Naver ai hackerton 2018 Team sadang solution [Internet]. Available:https://github.com/moonbings/naver-ai-hackathon-2018.
9	R. Dey and F. M. Salem. "Gate-variants of gated recurrent unit (GRU) neural networks," in 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), Boston, pp. 1597-1600 , 2017.
10	wiki fast .ai Logloss [Internet]. Available: http://wiki.fast.ai/index.php/Log_Loss
11	D. P. Kingma, J. Ba, "Adam: A Method for Stochastic Optimization," in The 3rd International Conference for Learning Representations, pp. 1-15, San Diego, 2015.

11	(2019) 한국정보통신학회논문지 신경학적 손상에 의한 언어장애인 음성 인식률 개선(H/W, S/W)에 관한 연구 / 23 (11) , 1397
2	(2018) 한국정보통신학회논문지 이진 분류를 위하여 거리계산을 이용한 특징 변환 기반의 가중된 최소 자승법 / 24 (2) , 219

KSCI

The Sentence Similarity Measure Using Deep-Learning and Char2Vec 딥러닝과 Char2Vec을 이용한 문장 유사도 판별

The Sentence Similarity Measure Using Deep-Learning and Char2Vec