Browse > Article
http://dx.doi.org/10.5351/KJAS.2021.34.3.295

Understanding the semantic change of Hangeul using word embedding  

Sun, Hyunseok (Department of Applied Statistics, Chung-Ang University)
Lee, Yung-Seop (Department of Statistics, Dongguk University)
Lim, Changwon (Department of Applied Statistics, Chung-Ang University)
Publication Information
The Korean Journal of Applied Statistics / v.34, no.3, 2021 , pp. 295-308 More about this Journal
Abstract
In recent years, as many people post their interests on social media or store documents in digital form due to the development of the internet and computer technologies, the amount of text data generated has exploded. Accordingly, the demand for technology to create valuable information from numerous document data is also increasing. In this study, through statistical techniques, we investigate how the meanings of Korean words change over time by using the presidential speech records and newspaper articles public data. Using this, we present a strategy that can be utilized in the study of the synchronic change of Hangeul. The purpose of this study is to deviate from the study of the theoretical language phenomenon of Hangeul, which was studied by the intuition of existing linguists or native speakers, to derive numerical values through public documents that can be used by anyone, and to explain the phenomenon of changes in the meaning of words.
Keywords
semantic change; word2vec; procrustes align; corpus linguistics;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Naptali W, Tsuchiya M, and Nakagawa S (2009). Word co-occurrence matrix and context dependent class in lsa based language model for speech recognition, International Journal of Computers, 1.
2 Davies M (2010). The Corpus of Historical American English: COHA, BYE, Brigham Young University.
3 Hamilton WL, Leskovec J, and Jurafsky D (2016). Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change.
4 Klingenberg CP (2015). Analyzing fluctuating asymmetry with geometric morphometrics: concepts, methods, and applications, Symmetry, 7, 843-934.   DOI
5 Matveeva I, Levow G, Farahat A, and Royer C (2007). Term representation with generalized latent semantic analysis, Recent Advances in Natural Language Processing IV: Selected Papers from RANLP 2005. Available from: https://doi.org/10.1075/cilt.292.08.
6 Park S, Byun J, Baek S, Cho Y, and Oh A (2018). Subword-level Word Vector Representations for Korean. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 1, 2429-2438.
7 Yoon P (2013). Korean semantic lecture, Youkrack.
8 Cho NH (2004). Acceptance and development of the theory of semantic change, Linguistics, 43, 461-485.
9 Choi TH, Choi YS, and Shin SM (2009). A study on the relationship between player characteristic factors and competitive factors of tennis grand slams competition using canonical correlation biplot and procrustes analysis, Korean Journal of Applied Statistics, 22, 855-864.   DOI
10 Deerwester S, Dumais ST, Furnas GW, Landauer TK, and Harshman R (1990). Indexing by latent semantic analysis, Journal of the American Society for Information Science, 41, 391-407.   DOI
11 Lin Y, Michel JB, Aiden EL, Orwant J, Brockman W, and Petrov S (2012). Syntactic annotations for the google books ngram corpus. In Proceedings of the ACL 2012 System Demonstrations, 169-174.
12 Harris ZS (1954). Distributional structure, Word, 10, 146-162.   DOI
13 Kim Y, Chiu YI, Hanaki K, Hegde D, and Petrov S (2014). Temporal Analysis of Language through Neural Language Models.
14 Kulkarni V, Al-Rfou R, Perozzi B, and Skiena S (2015). Statistically significant detection of linguistic change. In Proceedings of the 24th International Conference on World Wide Web, 625-635.
15 Mikolov T, Le QV, and Sutskever I (2013). Exploiting Similarities among Languages for Machine Translation.
16 Sahlgren M (2008). The distributional hypothesis, Italian Journal of Disability Studies, 20, 33-53.
17 Schonemann PH (1966). A generalized solution of the orthogonal procrustes problem. Psychometrika, 31, 1-10.   DOI
18 Golub GH and Reinsch C (1970). Singular value decomposition and least squares solutions, Umerische Mathematik, 14, 403-420.   DOI