Browse > Article
http://dx.doi.org/10.17703/JCCT.2022.8.5.489

Study on Difference of Wordvectors Analysis Induced by Text Preprocessing for Deep Learning  

Ko, Kwang-Ho (Dept. of Smart Automobile, Pyeongtaek Univ)
Publication Information
The Journal of the Convergence on Culture Technology / v.8, no.5, 2022 , pp. 489-495 More about this Journal
Abstract
It makes difference to LSTM D/L(Deep Learning) results for language model construction as the corpus preprocess changes. An LSTM model was trained with a famouse literaure poems(Ki Hyung-do's work) for training corpus in the study. You get the two wordvector sets for two corpus sets of the original text and eraised word ending text each once D/L training completed. It's been inspected of the similarity/analogy operation results, the positions of the wordvectors in 2D plane and the generated texts by the language models for the two different corpus sets. The suggested words by the silmilarity/analogy operations are changed for the corpus sets but they are related well considering the corpus characteristics as a literature work. The positions of the wordvectors are different for each corpus sets but the words sustained the basic meanings and the generated texts are different for each corpus sets also but they have the taste of the original style. It's supposed that the D/L language model can be a useful tool to enjoy the literature in object and in diverse with the analysis results shown in the study.
Keywords
Deep Learning; Wordvector; Ki Hyung-do; Text Preprocessing; Text Generation; Similarity;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 A. Basirat, "Real-valued Syntactic Word Vectors", Journal of Experimental & Theoretical Artificial Intelligence, 32(4), pp. 557-579, 2020   DOI
2 K. Kwangho, et al., "Input Dimension Reduction based on Continuous Word Vector for Deep Neural Network Language Model," Phonetics and Speech Sciences, 7(4), pp. 3-8, 2015.   DOI
3 L. Hickman, et al., "Text Preprocessing for Text Mining in Organizational Research: Review and Recommendations", Organizational Research Methods,25(1), pp.114-146, 2022   DOI
4 F. Heimerl, M. Gleicher, "Interactive Analysis of Word Vector Embeddings", Computer Graphics Forum, 37(3), pp. 253-265, 2018   DOI
5 Y. Chang, et al., "Using Word Semantic Concepts for Plagiarism Detection in Text Documents", Information Retrieval Journal, 24(4-5), pp.298-321. 2021   DOI
6 K. Sinjae, "Learning Tagging Ontology from Large Tagging Data," Journal of Korean Institute of Intelligent Systems, 18(2), pp. 157-162, 2008.   DOI
7 A. Gavric, et al., "Real-Time Data Processing Techniques for a Scalable Spatial and Temporal Dimension Reduction", 21st International Symposium(INFOTEH), pp. 1-6, 2022
8 Y. Lee, et al., "Applying Convolution Filter to Matrix of Word-clustering Based Document Representation", Neurocomputing, 315, pp.210- 220, 2018, doi:10.1016/j.neucom.2018.07.018   DOI
9 N. Fatima, et al., "A Systematic Literature Review on Text Generation Using Deep Neural Network Models", IEEE Access, 10, 53490-53503. 2022   DOI
10 K. Hyungsuc, Y. Janghoon, "Analyzing Semantic Relations of Word Vectors trained by The Word2vec Model", Journal of KIISE, 46(10), pp. 1088-1093, 2019   DOI
11 K. Kwangho, "Deep Learning Application for Core Image Analysis of the Poems by Ki Hyung-Do," Journal of the Convergence on Culture Technology, 7(3), pp. 591-598, 2021.   DOI