Browse > Article
http://dx.doi.org/10.5351/KJAS.2014.27.2.169

Standardizing Unstructured Big Data and Visual Interpretation using MapReduce and Correspondence Analysis  

Choi, Joseph (Department of Statistics, Pusan National University)
Choi, Yong-Seok (Department of Statistics, Pusan National University)
Publication Information
The Korean Journal of Applied Statistics / v.27, no.2, 2014 , pp. 169-183 More about this Journal
Abstract
Massive and various types of data recorded everywhere are called big data. Therefore, it is important to analyze big data and to nd valuable information. Besides, to standardize unstructured big data is important for the application of statistical methods. In this paper, we will show how to standardize unstructured big data using MapReduce which is a distribution processing system. We also apply simple correspondence analysis and multiple correspondence analysis to nd the relationship and characteristic of direct relationship words for Samsung Electronics and The Korea Economic Daily newspaper as well as Apple Inc.
Keywords
Big data; unstructured data; MapReduce; correspondence analysis; direct relationship words; The Korea Economic Daily;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Adrian, M. (2011). It's going mainstream, and it's your next opportunity, Teradata Magazine, AR-6309.
2 Choi, Y. S. (2001). Understanding and Application of Correspondence Analysis using SAS, Freedom Academy, Seoul.
3 Chiang, O. (2011). Twitter Hits Nearly 200M Accounts, 110M Tweets Per Day, Focuses On Global Ex- pansion, Forbes, Available from: http://www.forbes.com/sites/oliverchiang/2011/01/19/twitter-hits- nearly-200m-users-110m-tweets-per-day-focuses-on-global-expansion/
4 Dean, J. and Ghemawat, S. (2004). MapReduce: Simpli ed Data Processing on Large Clusters, OSDI, 1.
5 Gantz, J. and Reinsel, D. (2010). The digital universe decade-are you ready, White Paper, IDC.
6 Gantz, J. and Reinsel, D. (2011). Extracting value from chaos, IDC iView, 1-12.
7 Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L., Smolinski, M. S. and Brilliant, L. (2008). Detecting in uenza epidemics using search engine query data, Nature, 457(7232), 1012-1014.
8 Greenacre, M. J. (1984). The and Applications of Correspondence Analysis, Academic Press, New York.
9 Gruman, G. (2010). Tapping into the power of big data, Technology Forecast, 2010(3), 4-13.
10 Jeong, J. S. (2011). New value creation engine, new possibilities of big data and the corresponding strategy, IT & Future Strategy, 18, National Information Society Agency.
11 Kim, Y. and Cho, K. H. (2011). Big data and statistics, Journal of the Korean Data & Information Sciences Society, 24(5), 959-974.   과학기술학회마을   DOI   ScienceOn
12 Special Report (2010.02.25). Data, data everywhere, The Economist, Available from: http://www.eco- nomist.com/node/15557443
13 Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C. and Byers, A. H. (2011). big data: The next frontier for innovation, competition, and productivity, McKinsey Global Institute, 1-137.