DOI QR코드

DOI QR Code

Text Mining for Korean: Characteristics and Application to 2011 Korean Economic Census Data

한국어 텍스트 마이닝의 특성과 2011 한국 경제총조사 자료에의 응용

  • Goo, Juna (Biostatistics and Clinical Epidemiology Center, Samsung Medical Center) ;
  • Kim, Kyunga (Biostatistics and Clinical Epidemiology Center, Samsung Medical Center)
  • 구주나 (삼성서울병원 의생명정보센터) ;
  • 김경아 (삼성서울병원 의생명정보센터)
  • Received : 2014.10.14
  • Accepted : 2014.11.21
  • Published : 2014.12.31

Abstract

2011 Korean Economic Census is the first economic census in Korea, which contains text data on menus served by Korean-food restaurants as well as structured data on characteristics of restaurants including area, opening year and total sales. In this paper, we applied text mining to the text data and investigated statistical and technical issues and characteristics of Korean text mining. Pork belly roast was the most popular menu across provinces and/or restaurant types in year 2010, and the number of restaurants per 10000 people was especially high in Kangwon-do and Daejeon metropolitan city. Beef tartare and fried pork cutlet are popular menus in start-up restaurants while whole chicken soup and maeuntang (spicy fish stew) are in long-lived restaurants. These results can be used as a guideline for menu development to restaurant owners, and for government policy-making process that lead small restaurants to choose proper menus for successful business.

한국 전체 사업체 대한 최초의 전수조사인 2011 경제총조사 중 한식 음식점업 사업체 자료는 취급 메뉴에 대한 텍스트 자료와 영업 지역, 창립연월, 매출액 등 사업체의 특성을 나타내는 구조화 자료로 구성되어 있는 빅데이터이다. 본 연구에서는 취급 메뉴 자료에 텍스트 마이닝을 실시하는 과정에서 발생하는 통계 및 기술적 문제점들을 살펴보고, 이를 통해 한국어 텍스트 마이닝의 특징을 고찰하였다. 또한 텍스트 마이닝의 결과를 사업체 특성 자료와 결합하여 한식 메뉴와 이를 취급하는 사업체 특성 간의 연관성을 탐색하였다. 2010년 기준 가장 많은 사업체가 취급하는 인기 메뉴는 삼겹살구이로 특히 강원도와 대전광역시에 인구 대비 취급 사업체가 많았다. 신생 사업체의 인기 메뉴는 육회와 돈가스였고, 닭백숙과 매운탕 등이 장수 사업체가 많이 취급하는 메뉴였다. 이러한 결과들은 한식 음식점 창업시 메뉴 선정 가이드라인으로 활용될 수 있으며 관련 정부 부처가 영세 사업체들의 메뉴 변경 유도를 통한 폐업 방지등의 정책을 마련하는데 도움이 될 것이다.

Keywords

References

  1. Ahn, A. (2011). A Study of a Lexicon and Syntactic Patterns for an Automatic Classification of Korean Opinion Sentences, Master Thesis, Hankuk University of Foreign Studies.
  2. Bartere, M. M. and Deshmukh, P. R. (2012). Cluster Oriented Image Retrieval System, IJCA Proceedings on Emerging Trends in Computer Science and Information Technology (ETCSIT2012) etcsit1001, ETCSIT(3), 25-27.
  3. Choi, S., Jeong, C., Choi, Y. and Myaeng, S. (2009). Relation extraction based on extended composite kernel using flat lexical features, Journal of KIISE: Software and Applications, 36(8), 642-652.
  4. Choi, W. and Kim, D. (2009). A Study of Measuring Text Distances using the Hierarchical Clustering Method in Application to Pansori Narratives, Seoul National University the Journal of Humanites, 62, 203-229.
  5. Feldman, R. and Dagan, I. (1995). Forecasting item production with ARIMA model, Proceedings of the First International Conference on Knowledge Discovery and Data Mining, KDD-95, 112-117.
  6. Ko, G., Jung, W., Shin, Y., Park, S. and Jang, D. (2011). A Study on Development of Patent Information Retrieval Using Textmining, Journal of the Korea Academia-Industrial Cooperation Society, 12(8), 3677-3688. https://doi.org/10.5762/KAIS.2011.12.8.3677
  7. Hotho, A., A. Nurnberger, G. Paabss. (2005). A Brief Survey of Text Mining, Ldv Forum, 20(1), 19-62.
  8. Jeong, D., Kim, J., Kim, G., Heo, J., On, B. and Kang, M. (2013). A Proposal of a Keyword Extraction System for Detecting Social Issues, Journal of intelligence and information systems, 19(3), 1-23. https://doi.org/10.13088/jiis.2013.19.3.001
  9. Kam, M. and Song, M. (2012). A Study on Differences of Contents and Tones of Arguments among Newspapers Using Text Mining Analysis, Journal of Intelligence and Information Systems, 18(3), 53-77.
  10. Lee, D., Yeon, J., Hwang, I. and Lee, S. (2010). KKMA : A Tool for Utilizing Sejong Corpus based on Relational Database, Journal of KIISE: Computing Practices and Letters, 16(11), 1046-1050.
  11. Lee, H., Lee, J. and Lee, S. (1997). Noun Phrase Indexing using Clausal Segmentation, Journal of KIISE: Software and Applications, 24(3), 302-311.
  12. Lin, X. (2003). Text-Mining Based Journal Splitting, Proceedings of International Conference on Document Analysis and Recognition, 1075-1079.
  13. Mittermayer, M. and Knolmayer, G. (2006). Text Mining Systems for Market Response to News: A Survey, Working paper in Institut fur Wirtschaftsinformatik der Universitat Bern, 184, 1-17.
  14. SWRC (Semantic Web Research Center) (1999). HanNanum: Korean Morphological Analyzer, Software, Available from: http://semanticweb.kaist.ac.kr
  15. van Driel, M. A., Bruggeman, J., Vriend, G., Brunner, H. G. and Leunissen, J. A. (2006). A Text-Mining Analysis of the Human Phenome, European Journal of Human Genetics, 14(50), 535-542. https://doi.org/10.1038/sj.ejhg.5201585
  16. Vijayarani, S. and Vinupriya, M. M. (2013). An Efficient Edge Detection Algorithm for Facial Images in Image Mining, International Journal of Engineering Sciences & Research Technology, 2(10), 2880-2884.
  17. Yang, S. and Ko, Y. (2011). Extracting Comparative Elements for Korean Comparison Mining, Journal of Korean Institute of Information Scientists and Engineers (KIISE): Software and Applications, 38(12), 689-696.

Cited by

  1. Analysis of patterns in meteorological research and development using a text-mining algorithm vol.29, pp.5, 2016, https://doi.org/10.5351/KJAS.2016.29.5.935