[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.5391/IJFIS.2014.14.4.313

Big Numeric Data Classification Using Grid-based Bayesian Inference in the MapReduce Framework

Kim, Young Joon (Cyberdigm)
Lee, Keon Myung (Dept. of Computer Science, Chungbuk National University)

Publication Information

International Journal of Fuzzy Logic and Intelligent Systems / v.14, no.4, 2014 , pp. 313-321 More about this Journal

Abstract

In the current era of data-intensive services, the handling of big data is a crucial issue that affects almost every discipline and industry. In this study, we propose a classification method for large volumes of numeric data, which is implemented in a distributed programming framework, i.e., MapReduce. The proposed method partitions the data space into a grid structure and it then models the probability distributions of classes for grid cells by collecting sufficient statistics using distributed MapReduce tasks. The class labeling of new data is achieved by k-nearest neighbor classification based on Bayesian inference.

Keywords

big data; classification; data mining; Hadoop; MapReduce;

Citations & Related Records

Times Cited By KSCI : 3 (Citation Analysis)

Reference
Cited By KSCI

1	D. Koller, N. Friedman, Probabilistic Graphical Model: Principles and Techniques, The MIT Press, 2009.
2	Iris Data Set, http://archive.ics.uci.edu/ml/datasets/Iris
3	P. Cunningham, and S. J. Delany, "k-Nearest Neighbor Classifiers," Technical Report UCD-CSI-2007-4, 2007.
4	F. Kovacs, L. Csaba, and A. Babos. "Cluster validity measurement techniques," Proc. of 6th International Symposium of Hungarian Researchers on Computational Intelligence, 2005.
5	K.M. Lee, C.H. Lee, K.M. Lee, "Statistical cluster validity indexes to consider cohesion and separation," Proc. of 2012 Int. Conf. on Fuzzy Theory and Its Applications, iFUZZY 2012, pp. 228-232, 2012.
6	S.-B. Roh, J.-W. Jeong, T.-C. Ahn, "Fuzzy Learning Vector Quantization based on Fuzzy k-Nearest Neighbor Prototypes," Int. J. of Fuzzy Logic and Intell. Syst., vol.11, no.2, pp.84-88, 2011. 과학기술학회마을 DOI ScienceOn
7	S. Ko, D. Kim, B.-Y. Kang, "A Matrix-Based Genetic Algorithm for Structure Learning of Bayesian Networks," Int. J. of Fuzzy Logic and Intell. Syst., vol.11, no.3, pp.135-142, 2011. 과학기술학회마을 DOI ScienceOn
8	J. Dean, S. Ghemawat, MapReduce: Simplified Data Processing on Large Clusters, Proc. of the 6th Symp. on Operating Systems Design and Implementation, 2004.
9	J. Dean, S. Ghemawat, Hadoop, http://wiki.apache.org/hadoop/
10	G. Caruana, M. Li, and M. Qi, "A MapReduce based Parallel SVM for Large Scale Spam Filtering," Proc. of 8th Int. Conf. on Fuzzy Systems and Knowledge Discovery, pp.2659-2662, 2011.
11	K.-M. Lee, K.M. Lee, C. H. Lee, "Linguistic Classification Pattern Extraction for Numeric Data," Proc. of WSEAS, 2012.
12	L. Zhou, Z. Zhong, J. Chang, J. Li, J. Z. Huang, S. Feng, "Balanced Parallel FP-Growth with MapReduce," Proc. of IEEE YC-ICT, 2010.
13	L. Zhou, Z. Zhong, J. Chang, J. Li, J. Z. Huang, S. Feng, Apache Mahout, http://mahout.apache.org/
14	M. Geetika, "A Survey of Classification Methods and its Applications," Int. J. of Computer Applications, vol.53, No.17, 2012.
15	S. Theodoridis, and K. Koutroumbnas, Pattern Recognition, Elsevier, 2009.
16	D. Lu, Q. Weng, "A survey of image classification methods and techniques for improving classification performance," Int. J. of Remote Sensing, vol. 28, no.5, pp.823-870, 2007. DOI ScienceOn
17	M. M. Gaber, "Advances in data stream mining," WIREs Data Mining Knowl. Discov., vol.2, pp.79-85, 2012. DOI
18	K. M. Lee, "Locality-Sensitive Hashing Techniques for Nearest Neighbor Search," Int. J. of Fuzzy Logic and Intell. Syst., vol.12, no.4, pp.300-307, 2012. 과학기술학회마을 DOI ScienceOn
19	A. Bifet and R. Kirkby, Data Stream Mining : A Practical Approach, The University of Waikato, 2009.
20	K. P. Murphy, Machine Learning: A Probablisitc Perspective, 2012.
21	C. M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006.

1573-7543	(2017) Cluster Computing Bucket-size balancing locality sensitive hashing using the map reduce paradigm / (1573-7543)
15320626	(2018) Concurrency and Computation: Practice and Experience MapReduce-based storage and indexing for big health data / (15320626) , e4854