Browse > Article
http://dx.doi.org/10.7583/JKGS.2018.18.1.105

Extended Information Entropy via Correlation for Autonomous Attribute Reduction of BigData  

Park, In-Kyu (Dept. of Game Software, College of Engineering Joongbu University)
Abstract
Various data analysis methods used for customer type analysis are very important for game companies to understand their type and characteristics in an attempt to plan customized content for our customers and to provide more convenient services. In this paper, we propose a k-mode cluster analysis algorithm that uses information uncertainty by extending information entropy to reduce information loss. Therefore, the measurement of the similarity of attributes is considered in two aspects. One is to measure the uncertainty between each attribute on the center of each partition and the other is to measure the uncertainty about the probability distribution of the uncertainty of each property. In particular, the uncertainty in attributes is taken into account in the non-probabilistic and probabilistic scales because the entropy of the attribute is transformed into probabilistic information to measure the uncertainty. The accuracy of the algorithm is observable to the result of cluster analysis based on the optimal initial value through extensive performance analysis and various indexes.
Keywords
Information Entropy; K-modes Clustering; Categorical Data; Similarity; Uncertainty;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Sang-Hyun Lee, "A Study on Determining Factors for Manufacturers to Distributors Warehouse in Supply Chain", Journal of the Korea Convergence Society, Vol. 4, No. 2, pp. 15-20, 2013.   DOI
2 E. Y. Chan, W. K. Ching, M. K. Ng and J. Z. Huang, "An optimization algorithm for clustering using weighted dissimilarity measures", Pattern Recognition, Vol. 37, No. 5, pp. 943-952, 2004.   DOI
3 L. Bai, J. Liang, C. Dang, and F. Cao, "A novel attribute weighting algorithm for clustering high-dimensional categorical data", Pattern Recognition, Vol. 44, No. 12, pp. 2843-2861, 2011.   DOI
4 F. Cao, J. Liang, D. Li and X. Zhao, "A weighting k-modes algorithm for subspace clustering of categorical data", Neurocomputing, Vol. 108, pp. 23-30, 2013.   DOI
5 L. Jing, M.K. Ng, and J. Z. Hunag, "An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparce data", Knowledge and Data Engineering, IEEE Transactions on, Vol. 19, No. 8, pp. 1026-1041, 2007.   DOI
6 D. Barbara, Y. Li, and J. Couto, Coolcat: "an entropy-based algorithm for categorical clustering", in Proceedings of the 11th international conference on Information and knowledge management, ACM, pp. 582-589, 2002.
7 Z. Huang, "Extensions to the k-means algorithm for clustering large data sets with categorical values", Data mining and Knowledge Discovery, Vol.2, No. 3, pp. 283-304, 1998.   DOI
8 F. Cao, J. Liang, D. Li, L. Bai and C. Dang, "A dissimilarity measure for the k-Modes clustering algorithm, Knowledge-Based Systems", Vol. 26, pp. 120-127, 2012.   DOI
9 In-Kyu Park. "The generation of control rules for data mining", The Journal of Digital Policy & Management, Vol. 11, No.1, pp.343-349, 2013.
10 J. L. Carbonera and M. Abel, "Categorical data clustering: a correlation-based approach for unsupervised attribute weighting", in Proceedings of ICTAI, 2014.
11 G. Gan and J. Wu, "Subspace clustering for high dimensional categorical data", ACM SIGDD Explorations Newsletter, Vol. 6, No. 2, pp.87-94, 2004.   DOI
12 M. J. Zaki, M. Peters I. Assent, and T. Seidl, "Clicks: An effective algorithm for mining subspace clusters in categorical datasets", Data & Knowledge Engineering, Vol. 60, No. 1, pp. 51-70, 2007.   DOI
13 H.-P. Kriegel, P. Kroger and A. Aimek, "Subspace clustering", Wisley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, Vol. 2, No. 4, pp. 351-364, 2012.   DOI
14 E. Cesario, G. Manco and R. Ortale, "Top-down parameter-free clustering fo high-dimensional categorical data", IEEE Trans. on Knowledge and Data Engineering, Vol. 19, No. 12, pp. 1607-1624, 2007.   DOI
15 J. L. Carbonera and M. Abel, "An entropy-based subspace clustering algorithm for categorical data", 2014 IEEE 26th International Conference on Tools with Artificial Intelligence, pVol. 48, No. 26, pp. 272-277, 2014.