Browse > Article
http://dx.doi.org/10.14369/jkmc.2019.32.1.061

Comparison between Word Embedding Techniques in Traditional Korean Medicine for Data Analysis: Implementation of a Natural Language Processing Method  

Oh, Junho (Korea Institute of Oriental Medicine)
Publication Information
Journal of Korean Medical classics / v.32, no.1, 2019 , pp. 61-74 More about this Journal
Abstract
Objectives : The purpose of this study is to help select an appropriate word embedding method when analyzing East Asian traditional medicine texts as data. Methods : Based on prescription data that imply traditional methods in traditional East Asian medicine, we have examined 4 count-based word embedding and 2 prediction-based word embedding methods. In order to intuitively compare these word embedding methods, we proposed a "prescription generating game" and compared its results with those from the application of the 6 methods. Results : When the adjacent vectors are extracted, the count-based word embedding method derives the main herbs that are frequently used in conjunction with each other. On the other hand, in the prediction-based word embedding method, the synonyms of the herbs were derived. Conclusions : Counting based word embedding methods seems to be more effective than prediction-based word embedding methods in analyzing the use of domesticated herbs. Among count-based word embedding methods, the TF-vector method tends to exaggerate the frequency effect, and hence the TF-IDF vector or co-word vector may be a more reasonable choice. Also, the t-score vector may be recommended in search for unusual information that could not be found in frequency. On the other hand, prediction-based embedding seems to be effective when deriving the bases of similar meanings in context.
Keywords
Word embedding; East Asian traditional medicine; Korean Medicine; data analysis; natural language processing;
Citations & Related Records
Times Cited By KSCI : 11  (Citation Analysis)
연도 인용수 순위
1 Kang BM. Language, computer, corpus linguistics (revised edition). Seoul. Korea University Press. 2011. pp.122-123.
2 Peng W. Dictionary of Chinese medicine prescription (1st edition). Beijing. People's Medical Publishing House. 2005. pp.3-4.
3 Stefan Bordag. A Comparison of Co-occurrence and Similarity Measures as Simulations of Context. Alexander Gelbukh ed.. Computational Linguistics and Intelligent Text Processing. New York. Springer. 2008. pp.52-63.
4 Bae HJ et al. Investigation of the Possibility of Research on Medical Classics Applying Text Mining. The Journal of Korean Medical Classics. 2018. 31(4). pp.27-46.   DOI
5 Bang MW, Kim KW, Lee BW. A Study on the Inference and Classification Method of the Effectiveness Using the Herb Composition. Herbal formula science. 2017. 25(1). pp.29-38.   DOI
6 Hwang HS et al. Word Embedding using Relative Position Information between Words. Journal of KIISE. 2018. 45(9). pp.943-949.   DOI
7 Kang HS, Yang JH. The Analogy Test Set Suitable to Evaluate Word Embedding Models for Korean. Journal of Digital Contents Society. 2018. 19(10). pp.1999-2008.   DOI
8 Kim AN et al. Network Analysis on Herbal Combinations in Korean Medicine for Insomnia. The Journal Of Korean Medical Classics. 2018. 31(4). pp.68-78.
9 Kim KW, Kim TY, Lee BW. Automatic Extraction Method of Compositional Herb Using Herb List. The Journal of Korean Medical Classics. 2014. 27(3). pp.155-166.   DOI
10 Kim KW, Kim TY, Lee BW. Analysis of Prescriptions from Taepyeonghyeminhwajegukbang, Somunsunmyungronbang and Nansilbijang based on Herb weight ratio grade. The Journal of Korean Medical Classics. 2014. 27(4). pp.73-84.   DOI
11 Kim WJ, Kim DH, Jang HW. Semantic Extention Search for Documents Using the Word2vec. Journal of the Korea Contents Association. 2016. 16(10). pp.687-692.   DOI
12 M Sahlgren. The distributional hypothesis. Italian Journal of Linguistics. 2008. 20. pp.33-53.
13 Oh JH. Deduction of Acupoints Selecting Elements on Zhenjiuzishengjing using hierarchical clustering. Journal of DaeJeon University KM institute. 2014. 23(1). pp.115-124.
14 Oh JH. HF-IFF: Applying TF-IDF to Measure Symptom-Medicinal Herb Relevancy and Visualize Medicinal Herb Characteristics -Studying Formulations in Cheongkangeuigam-. The Korea Association of Herbology. 2015. 30(3). pp.63-68.   DOI
15 Park IS et al. Characterization of Five Shu Acupoint Pattern in Saam Acupuncture Using Text Mininig. Korean J Acupunct. 2015. 32. pp.66-74.   DOI
16 Oh JH. Measure of the Associations of Accupoints and Pathologies Documented in the Classical Acupuncture Literature. Korean Journal of Acupuncture. 2016. 33. pp.18-32.   DOI
17 Oh, JH. Can Similarities in Medical thought be Quantified. The Journal of Korean Medical Classics. 2018. 31(2). pp.71-82.   DOI
18 Park DS, Kim HJ. A Proposal of Join Vector for Semantic Factor Reflection in TF-IDF Based Keyword Extraction. The Journal of Korean Institute of Information Technology. 2018. 16(2). pp.1-16.   DOI
19 Song YS et al. A study of relationship between excrement and materia medica in Bangyakhappyeon based on the data mining analysis. 2012. 16(2). Journal of Korean Institute of Oriental Medical Diagnostics. 2012. 16(2). pp.33-45.
20 Wu YH et al. Feature Comparison by Prescription Configuration Analysis among Liuhejian`s and Lidongyuan's Books and Hejijufang. The Journal of Korean Medical Classics. 2015. 28(1). pp.55-69.   DOI
21 Chinese Medical Database. Beijing. Hunan Electronic Audio and Video Publishing House. 2003.
22 Korea Institute of Oriental Medicine. Mediclassics. [cited on Jan 12, 2019]. Avaiable from: https://mediclassics.kr
23 Korea Intellectual Property Office. Korean Traditonal Knowledge Portal. [cited on Jan 12, 2019]. Avaiable from: http://www.koreantk.com
24 Mikolov, T et al. Efficient Estimation of Word Representations in Vector Space. arXiv preprint arXiv:1301.3781. 2013. [cited on Jan 12, 2019]. Avaiable from: https://arxiv.org/abs/1301.3781
25 Wikipedia. Word embedding. [cited on Jan 12, 2019]. Avaiable from: https://en.wikipedia.org/wiki/Word_embedding
26 Pennington J, Socher R, Manning C. GloVe: Global Vectors for Word Representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014. [cited on Jan 12, 2019]. Avaiable from: https://www.aclweb.org/anthology/D14-1162
27 Wikipedia. Data analysis. [cited on Jan 12, 2019]. Avaiable from: https://en.wikipedia.org/wiki/Data_analysis
28 Wikipedia. Natural language processing. [cited on Jan 12, 2019]. Avaiable from: https://en.wikipedia.org/wiki/Natural_language_processing
29 Wordnet. [cited on Jan 12, 2019]. Avaiable from: https://wordnet.princeton.edu/documentation/20-wnstats7wn