Browse > Article
http://dx.doi.org/10.9723/jksiis.2015.20.2.113

A study on unstructured text mining algorithm through R programming based on data dictionary  

Lee, Jong Hwa (부경대학교 일반대학원)
Lee, Hyun-Kyu (부경대학교 경영대학)
Publication Information
Journal of Korea Society of Industrial Information Systems / v.20, no.2, 2015 , pp. 113-124 More about this Journal
Abstract
Unlike structured data which are gathered and saved in a predefined structure, unstructured text data which are mostly written in natural language have larger applications recently due to the emergence of web 2.0. Text mining is one of the most important big data analysis techniques that extracts meaningful information in the text because it has not only increased in the amount of text data but also human being's emotion is expressed directly. In this study, we used R program, an open source software for statistical analysis, and studied algorithm implementation to conduct analyses (such as Frequency Analysis, Cluster Analysis, Word Cloud, Social Network Analysis). Especially, to focus on our research scope, we used keyword extract method based on a Data Dictionary. By applying in real cases, we could find that R is very useful as a statistical analysis software working on variety of OS and with other languages interface.
Keywords
R program; Big data; Text Mining; unstructured data;
Citations & Related Records
Times Cited By KSCI : 5  (Citation Analysis)
연도 인용수 순위
1 Feinerer I, "An introduction to text mining in R". R News. Vol. 8, No. 2, pp. 19-22, 2008.
2 Zhang J, Jang J, Kim S, Lee H, Lee C, Semicon L, "A study on the efficient patent search process using big data analysis tool R", Journal of Korea Safety Management & Science, Vol. 15, No. 4, pp. 289-294, 2013.   DOI
3 Yang S. and Ko Y., "Extracting Comparative Elements for Korean Comparison Mining", Journal of KIISE, Vol. 38, No. 12, pp. 689-696, 2011.
4 "THE R TIPS(THE SECOND EDITION)", Nobuo Funao, 2009.
5 Feinerer I, "Introduction to the tm package text mining in R. nd)", n.pag.Web, 2014.
6 Meyer D, Hornik K, Feinerer I, "Text mining infrastructure in R", Journal of Statistical Software, Vol. 25, No. 5, pp. 1-54, 2008,
7 Zhao Y, "R and data mining: Examples and case studies", Academic Press, 2012.
8 Williams G, "Data science with R text mining", 2014.
9 Ingo F. and Kurt H., "tm: Text Mining Package". R package version 0.6., 2014. http://CRAN.R-project.org/package=tm
10 Hadley W., "stringr: Make it easier to work with strings". R package version 0.6.2., 2012. http://CRAN.R-project.org/package=stringr
11 Kam M. and Song M., "A Study on Differences of Contents and Tones of Arguments among Newspapers Using Text Mining Analysis", Journal of intelligence and information systems, Vol. 8, No. 3, pp. 53-77, 2012.
12 Kurt H., "NLP: Natural Language Processing Infrastructure". R package version 0.1-5, 2014. http://CRAN.R-project.org/package=NLP
13 Lee Ji Ho, "Big Data, Data Mining and Temporary Reproduction", The Journal of Intellectual Property, Vol. 8, No. 4, pp. 93-125, 2013.   DOI
14 Kang S. J., "Constructing a Large Interlinked Ontology Network for the Web of Data", Journal of Korean Industrial Information Systems Society, Vol. 15, No. 1, pp. 15-23, 2010.
15 URL http://www.worldometers.info/kr
16 URL htt://www.wikipedia.org
17 Won J. Y. and Kim D. G., "Deduction of Social Risk Issues Using Text Mining", Korean Review of Crisis & Emergency Management, Vol. 10, No. 7, pp. 33-52, 2014.
18 Kwon H. R., Na J. H., Yoo J. S., Cho W. S., "Text-mining Techniques for Metabolic Pathway Reconstruction", Journal of Korean Industrial Information Systems Society, Vol. 12, No. 4, pp. 138-147, 2007.
19 Harley W., "ggplot2: elegant graphics for data analysis". Springer New York, 2009.
20 Csardi G., Nepusz T., "The igraph software package for complex network research", InterJournal, Complex Systems 1695. 2006. http://igraph.org
21 Ian F, "wordcloud: Word Clouds". R package version 2.5, 2014. http://CRAN.R-project.org/package=wordcloud
22 Telecommunication Technology Association (http://www.tta.or.kr/)
23 Lee H. K., "An analysis of mobile communication environment by a socio-technical approach", Journal of Korean Industrial Information Systems Society, Vol. 18, No. 2, pp. 59-69, 2013