Browse > Article
http://dx.doi.org/10.5391/IJFIS.2014.14.4.313

Big Numeric Data Classification Using Grid-based Bayesian Inference in the MapReduce Framework  

Kim, Young Joon (Cyberdigm)
Lee, Keon Myung (Dept. of Computer Science, Chungbuk National University)
Publication Information
International Journal of Fuzzy Logic and Intelligent Systems / v.14, no.4, 2014 , pp. 313-321 More about this Journal
Abstract
In the current era of data-intensive services, the handling of big data is a crucial issue that affects almost every discipline and industry. In this study, we propose a classification method for large volumes of numeric data, which is implemented in a distributed programming framework, i.e., MapReduce. The proposed method partitions the data space into a grid structure and it then models the probability distributions of classes for grid cells by collecting sufficient statistics using distributed MapReduce tasks. The class labeling of new data is achieved by k-nearest neighbor classification based on Bayesian inference.
Keywords
big data; classification; data mining; Hadoop; MapReduce;
Citations & Related Records
Times Cited By KSCI : 3  (Citation Analysis)
연도 인용수 순위
1 D. Koller, N. Friedman, Probabilistic Graphical Model: Principles and Techniques, The MIT Press, 2009.
2 Iris Data Set, http://archive.ics.uci.edu/ml/datasets/Iris
3 P. Cunningham, and S. J. Delany, "k-Nearest Neighbor Classifiers," Technical Report UCD-CSI-2007-4, 2007.
4 F. Kovacs, L. Csaba, and A. Babos. "Cluster validity measurement techniques," Proc. of 6th International Symposium of Hungarian Researchers on Computational Intelligence, 2005.
5 K.M. Lee, C.H. Lee, K.M. Lee, "Statistical cluster validity indexes to consider cohesion and separation," Proc. of 2012 Int. Conf. on Fuzzy Theory and Its Applications, iFUZZY 2012, pp. 228-232, 2012.
6 S.-B. Roh, J.-W. Jeong, T.-C. Ahn, "Fuzzy Learning Vector Quantization based on Fuzzy k-Nearest Neighbor Prototypes," Int. J. of Fuzzy Logic and Intell. Syst., vol.11, no.2, pp.84-88, 2011.   과학기술학회마을   DOI   ScienceOn
7 S. Ko, D. Kim, B.-Y. Kang, "A Matrix-Based Genetic Algorithm for Structure Learning of Bayesian Networks," Int. J. of Fuzzy Logic and Intell. Syst., vol.11, no.3, pp.135-142, 2011.   과학기술학회마을   DOI   ScienceOn
8 J. Dean, S. Ghemawat, MapReduce: Simplified Data Processing on Large Clusters, Proc. of the 6th Symp. on Operating Systems Design and Implementation, 2004.
9 J. Dean, S. Ghemawat, Hadoop, http://wiki.apache.org/hadoop/
10 G. Caruana, M. Li, and M. Qi, "A MapReduce based Parallel SVM for Large Scale Spam Filtering," Proc. of 8th Int. Conf. on Fuzzy Systems and Knowledge Discovery, pp.2659-2662, 2011.
11 K.-M. Lee, K.M. Lee, C. H. Lee, "Linguistic Classification Pattern Extraction for Numeric Data," Proc. of WSEAS, 2012.
12 L. Zhou, Z. Zhong, J. Chang, J. Li, J. Z. Huang, S. Feng, "Balanced Parallel FP-Growth with MapReduce," Proc. of IEEE YC-ICT, 2010.
13 L. Zhou, Z. Zhong, J. Chang, J. Li, J. Z. Huang, S. Feng, Apache Mahout, http://mahout.apache.org/
14 M. Geetika, "A Survey of Classification Methods and its Applications," Int. J. of Computer Applications, vol.53, No.17, 2012.
15 S. Theodoridis, and K. Koutroumbnas, Pattern Recognition, Elsevier, 2009.
16 D. Lu, Q. Weng, "A survey of image classification methods and techniques for improving classification performance," Int. J. of Remote Sensing, vol. 28, no.5, pp.823-870, 2007.   DOI   ScienceOn
17 M. M. Gaber, "Advances in data stream mining," WIREs Data Mining Knowl. Discov., vol.2, pp.79-85, 2012.   DOI
18 K. M. Lee, "Locality-Sensitive Hashing Techniques for Nearest Neighbor Search," Int. J. of Fuzzy Logic and Intell. Syst., vol.12, no.4, pp.300-307, 2012.   과학기술학회마을   DOI   ScienceOn
19 A. Bifet and R. Kirkby, Data Stream Mining : A Practical Approach, The University of Waikato, 2009.
20 K. P. Murphy, Machine Learning: A Probablisitc Perspective, 2012.
21 C. M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006.