Browse > Article
http://dx.doi.org/10.5351/KJAS.2014.27.7.1207

Text Mining for Korean: Characteristics and Application to 2011 Korean Economic Census Data  

Goo, Juna (Biostatistics and Clinical Epidemiology Center, Samsung Medical Center)
Kim, Kyunga (Biostatistics and Clinical Epidemiology Center, Samsung Medical Center)
Publication Information
The Korean Journal of Applied Statistics / v.27, no.7, 2014 , pp. 1207-1217 More about this Journal
Abstract
2011 Korean Economic Census is the first economic census in Korea, which contains text data on menus served by Korean-food restaurants as well as structured data on characteristics of restaurants including area, opening year and total sales. In this paper, we applied text mining to the text data and investigated statistical and technical issues and characteristics of Korean text mining. Pork belly roast was the most popular menu across provinces and/or restaurant types in year 2010, and the number of restaurants per 10000 people was especially high in Kangwon-do and Daejeon metropolitan city. Beef tartare and fried pork cutlet are popular menus in start-up restaurants while whole chicken soup and maeuntang (spicy fish stew) are in long-lived restaurants. These results can be used as a guideline for menu development to restaurant owners, and for government policy-making process that lead small restaurants to choose proper menus for successful business.
Keywords
Text mining; dictionary construction; big data; Korean economic census;
Citations & Related Records
Times Cited By KSCI : 6  (Citation Analysis)
연도 인용수 순위
1 van Driel, M. A., Bruggeman, J., Vriend, G., Brunner, H. G. and Leunissen, J. A. (2006). A Text-Mining Analysis of the Human Phenome, European Journal of Human Genetics, 14(50), 535-542.   DOI
2 Vijayarani, S. and Vinupriya, M. M. (2013). An Efficient Edge Detection Algorithm for Facial Images in Image Mining, International Journal of Engineering Sciences & Research Technology, 2(10), 2880-2884.
3 Yang, S. and Ko, Y. (2011). Extracting Comparative Elements for Korean Comparison Mining, Journal of Korean Institute of Information Scientists and Engineers (KIISE): Software and Applications, 38(12), 689-696.   과학기술학회마을
4 Feldman, R. and Dagan, I. (1995). Forecasting item production with ARIMA model, Proceedings of the First International Conference on Knowledge Discovery and Data Mining, KDD-95, 112-117.
5 Ko, G., Jung, W., Shin, Y., Park, S. and Jang, D. (2011). A Study on Development of Patent Information Retrieval Using Textmining, Journal of the Korea Academia-Industrial Cooperation Society, 12(8), 3677-3688.   과학기술학회마을   DOI   ScienceOn
6 Hotho, A., A. Nurnberger, G. Paabss. (2005). A Brief Survey of Text Mining, Ldv Forum, 20(1), 19-62.
7 Jeong, D., Kim, J., Kim, G., Heo, J., On, B. and Kang, M. (2013). A Proposal of a Keyword Extraction System for Detecting Social Issues, Journal of intelligence and information systems, 19(3), 1-23.   과학기술학회마을   DOI
8 Kam, M. and Song, M. (2012). A Study on Differences of Contents and Tones of Arguments among Newspapers Using Text Mining Analysis, Journal of Intelligence and Information Systems, 18(3), 53-77.   과학기술학회마을
9 Lee, D., Yeon, J., Hwang, I. and Lee, S. (2010). KKMA : A Tool for Utilizing Sejong Corpus based on Relational Database, Journal of KIISE: Computing Practices and Letters, 16(11), 1046-1050.   과학기술학회마을
10 Lee, H., Lee, J. and Lee, S. (1997). Noun Phrase Indexing using Clausal Segmentation, Journal of KIISE: Software and Applications, 24(3), 302-311.
11 Lin, X. (2003). Text-Mining Based Journal Splitting, Proceedings of International Conference on Document Analysis and Recognition, 1075-1079.
12 Mittermayer, M. and Knolmayer, G. (2006). Text Mining Systems for Market Response to News: A Survey, Working paper in Institut fur Wirtschaftsinformatik der Universitat Bern, 184, 1-17.
13 SWRC (Semantic Web Research Center) (1999). HanNanum: Korean Morphological Analyzer, Software, Available from: http://semanticweb.kaist.ac.kr
14 Choi, W. and Kim, D. (2009). A Study of Measuring Text Distances using the Hierarchical Clustering Method in Application to Pansori Narratives, Seoul National University the Journal of Humanites, 62, 203-229.
15 Ahn, A. (2011). A Study of a Lexicon and Syntactic Patterns for an Automatic Classification of Korean Opinion Sentences, Master Thesis, Hankuk University of Foreign Studies.
16 Bartere, M. M. and Deshmukh, P. R. (2012). Cluster Oriented Image Retrieval System, IJCA Proceedings on Emerging Trends in Computer Science and Information Technology (ETCSIT2012) etcsit1001, ETCSIT(3), 25-27.
17 Choi, S., Jeong, C., Choi, Y. and Myaeng, S. (2009). Relation extraction based on extended composite kernel using flat lexical features, Journal of KIISE: Software and Applications, 36(8), 642-652.   과학기술학회마을