Browse > Article
http://dx.doi.org/10.14400/JDC.2015.13.11.157

A Big Data Analysis by Between-Cluster Information using k-Modes Clustering Algorithm  

Park, In-Kyoo (Dept. of Computer.Game Engineering, College of Engineering)
Publication Information
Journal of Digital Convergence / v.13, no.11, 2015 , pp. 157-164 More about this Journal
Abstract
This paper describes subspace clustering of categorical data for convergence and integration. Because categorical data are not designed for dealing only with numerical data, The conventional evaluation measures are more likely to have the limitations due to the absence of ordering and high dimensional data and scarcity of frequency. Hence, conditional entropy measure is proposed to evaluate close approximation of cohesion among attributes within each cluster. We propose a new objective function that is used to reflect the optimistic clustering so that the within-cluster dispersion is minimized and the between-cluster separation is enhanced. We performed experiments on five real-world datasets, comparing the performance of our algorithms with four algorithms, using three evaluation metrics: accuracy, f-measure and adjusted Rand index. According to the experiments, the proposed algorithm outperforms the algorithms that were considered int the evaluation, regarding the considered metrics.
Keywords
Convergence and Integration; Subspace Partition; Categorical Data; Clustering; Entropy;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Sang-Hyun Lee, "A Study on Determining Factors for Manufacturers to Distributors Warehouse in Supply Chain", Journal of the Korea Convergence Society, Vol. 4, No. 2, pp. 15-20, 2013.   DOI
2 E. Y. Chan, W. K. Ching, M. K. Ng and J. Z. Huang, "An optimization algorithm for clustering using weighted dissimilarity measures", Pattern Recognition, Vol. 37, No. 5, pp. 943-952, 2004.   DOI
3 L. Bai, J. Liang, C. Dang, and F. Cao, "A novel attribute weighting algorithm for clustering high-dimensional categorical data", Pattern Recognition, Vol. 44, No. 12, pp. 2843-2861, 2011.   DOI
4 F. Cao, J. Liang, D. Li and X. Zhao, "A weighting k-modes algorithm for subspace clustering of categorical data", Neurocomputing, Vol. 108, pp. 23-30, 2013.   DOI
5 L. Jing, M.K. Ng, and J. Z. Hunag, "An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparce data", Knowledge and Data Engineering, IEEE Transactions on, Vol. 19, No. 8, pp. 1026-1041, 2007.   DOI
6 D. Barbara, Y. Li, and J. Couto, Coolcat: "an entropy-based algorithm for categorical clustering", in Proceedings of the 11th international conference on Information and knowledge management, ACM, pp. 582-589, 2002.
7 Z. Huang, "Extensions to the k-means algorithm for clustering large data sets with categorical values", Data mining and Knowledge Discovery, Vol.2, No. 3, pp. 283-304, 1998.   DOI
8 F. Cao, J. Liang, D. Li, L. Bai and C. Dang, "A dissimilarity measure for the k-Modes clustering algorithm, Knowledge-Based Systems", Vol. 26, pp. 120-127, 2012.   DOI
9 In-Kyu Park. "The generation of control rules for data mining", The Journal of Digital Policy & Management, Vol. 11, No.1, pp.343-349, 2013.
10 J. L. Carbonera and M. Abel, "Categorical data clustering: a correlation-based approach for unsupervised attribute weighting", in Proceedings of ICTAI, 2014.
11 J. L. Carbonera and M. Abel, "An entropy-based subspace clustering algorithm for categorical data", 2014 IEEE 26th International Conference on Tools with Artificial Intelligence, pVol. 48, No. 26, pp. 272-277, 2014.
12 G. Gan and J. Wu, "Subspace clustering for high dimensional categorical data", ACM SIGDD Explorations Newsletter, Vol. 6, No. 2, pp.87-94, 2004.   DOI
13 M. J. Zaki, M. Peters I. Assent, and T. Seidl, "Clicks: An effective algorithm for mining subspace clusters in categorical datasets", Data & Knowledge Engineering, Vol. 60, No. 1, pp. 51-70, 2007.   DOI
14 E. Cesario, G. Manco and R. Ortale, "Top-down parameter-free clustering fo high-dimensional categorical data", IEEE Trans. on Knowledge and Data Engineering, Vol. 19, No. 12, pp. 1607-1624, 2007.   DOI
15 H.-P. Kriegel, P. Kroger and A. Aimek, "Subspace clustering", Wisley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, Vol. 2, No. 4, pp. 351-364, 2012.   DOI