Browse > Article
http://dx.doi.org/10.5626/JCSE.2016.10.1.1

An Improved Hybrid Canopy-Fuzzy C-Means Clustering Algorithm Based on MapReduce Model  

Dai, Wei (School of Economics and Management, Hubei Polytechnic University)
Yu, Changjun (School of Computer Science and Technology, Wuhan University of Technology)
Jiang, Zilong (School of Computer Science and Technology, Wuhan University of Technology)
Publication Information
Journal of Computing Science and Engineering / v.10, no.1, 2016 , pp. 1-8 More about this Journal
Abstract
The fuzzy c-means (FCM) is a frequently utilized algorithm at present. Yet, the clustering quality and convergence rate of FCM are determined by the initial cluster centers, and so an improved FCM algorithm based on canopy cluster concept to quickly analyze the dataset has been proposed. Taking advantage of the canopy algorithm for its rapid acquisition of cluster centers, this algorithm regards the cluster results of canopy as the input. In this way, the convergence rate of the FCM algorithm is accelerated. Meanwhile, the MapReduce scheme of the proposed FCM algorithm is designed in a cloud environment. Experimental results demonstrate the hybrid canopy-FCM clustering algorithm processed by MapReduce be endowed with better clustering quality and higher operation speed.
Keywords
FCM; Canopy; Clustering; MapReduce;
Citations & Related Records
연도 인용수 순위
  • Reference
1 J. Nayak, B. Naik, and H. S. Behera, "Fuzzy C-means (FCM) clustering algorithm: a decade review from 2000 to 2014," in Computational Intelligence in Data Mining-Volume 2, New Delhi: Springer India, pp. 133-149, 2015.
2 D. B. Hassen, H. Taleb, I. B. Yaacoub, and N. Mnif, "Classification of chest lesions with using fuzzy c-means algorithm and support vector machines," in International Joint Conference SOCO'13-CISIS'13-ICEUTE'13, Cham: Springer International Publishing, pp. 319-328, 2014.
3 N. Bharill and A. Tiwari, "Handling big data with fuzzy based classification approach," in Advance Trends in Soft Computing, Cham: Springer International Publishing, pp. 219-227, 2014.
4 S. R. Kannan, S. Ramathilagam, A. Sathya, and R. Pandiyarajan, "Effective fuzzy c-means based kernel function in segmenting medical images," Computers in Biology and Medicine, vol. 40, no. 6, pp. 572-579, 2010.   DOI
5 X. Wang, Y. Wang, and L. Wang, "Improving fuzzy c-means clustering based on feature-weight learning," Pattern Recognition Letters, vol. 25, no. 10, pp. 1123-1132, 2004.   DOI
6 R. M. Esteves and C. Rong, "Using Mahout for clustering Wikipedia's latest articles: a comparison between k-means and fuzzy c-means in the cloud," in Proceedings of IEEE 3rd International Conference on Cloud Computing Technology and Science (CloudCom), Athens, Greece, 2011, pp. 565-569.
7 Q. Yu and Y. Dai, "Parallel fuzzy C-means algorithm based on MapReduce," Computer Engineering and Applications, vol. 49, no. 14, pp. 133-137, 2013.
8 J. Q. Zhang, X. W. Zheng, and H. P. Wu, "Research on fuzzy C-means clustering algorithm parallel," Microcomputer & Its Applications, vol. 29, no. 23, pp. 8-18, 2010.
9 D. Irfan, X. Xu, S. Deng, and Z. He, "S-Canopy: a featurebased clustering algorithm for supplier categorization," in Proceedings of IEEE 4th Conference on Industrial Electronics and Applications, Xi'an, China, 2009, pp. 677-681.
10 A. McCallum, K. Nigam, and L. H. Ungar, "Efficient clustering of high-dimensional data sets with application to reference matching," in Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, 2000, pp. 169-178.
11 Y. Li, "Research on parallelization of clustering algorithm based on MapReduce," Sun Yat-Sen University, Guangzhou, China, 2010.
12 J. C. Dunn, "A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters," Journal of Cybernetics, vol. 3, no. 3, pp. 32-57, 1973.   DOI
13 J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algoritms, New York: Plenum Press, 1981.
14 J. V. de Oliveira and W. Pedrycz, Advances in Fuzzy Clustering and Its Applications, New York: Wiley, 2007.
15 M. Meloun, and J. Militky, Kompendium statistickeho zpracovani dat, Praha: Academia, 2006.
16 E. H. Ruspini, "Numerical methods for fuzzy clustering," Information Sciences, vol. 2, no. 3, pp. 319-350, 1970.   DOI
17 A. Al-Dallal and R. S. Abdulwahab, "Achieving high recall and precision with HTLM documents: an innovation approach in information retrieval," in Proceedings of the World Congress on Engineering, London, 2011, pp. 1883-1888.
18 J. Euzenat, "Semantic precision and recall for ontology alignment evaluation," in Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India, 2007, pp. 348-353.