Browse > Article
http://dx.doi.org/10.5351/CKSS.2006.13.3.719

Cluster Analysis with Balancing Weight on Mixed-type Data  

Chae, Seong-San (Department of Applied Statistics, Daejeon University)
Kim, Jong-Min (Division of Science and Mathematics, University of Minnesota)
Yang, Wan-Youn (Department of Applied Statistics, Kyungwon University)
Publication Information
Communications for Statistical Applications and Methods / v.13, no.3, 2006 , pp. 719-732 More about this Journal
Abstract
A set of clustering algorithms with proper weight on the formulation of distance which extend to mixed numeric and multiple binary values is presented. A simple matching and Jaccard coefficients are used to measure similarity between objects for multiple binary attributes. Similarities are converted to dissimilarities between i th and j th objects. The performance of clustering algorithms with balancing weight on different similarity measures is demonstrated. Our experiments show that clustering algorithms with application of proper weight give competitive recovery level when a set of data with mixed numeric and multiple binary attributes is clustered.
Keywords
Agglomerative clustering algorithm; mixed-type attribute; association coefficient;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Gower, J.C. (1967). A comparison of some methods of cluster analysis. Biometrics, Vol. 23, 623-637   DOI   ScienceOn
2 Gower, J.C. and Legendre, P. (1986), Metric and Euclidean properties of dis -similarity coefficients. Journal of Classification, Vol. 3, 5-48   DOI
3 Huang, Z. (1998). Extensions to the k-means algorithms for clustering large data sets with categorical values. Data Mining and Knowledge Discovery, Vol. 2, 283-304   DOI
4 Jain, A.K. and Dubes, R.C, (1988). Algorithms for Clustering Data. Prentice Hall
5 Lee, J.J. (2005). Discriminant analysis of binary data with multinomial distri -bution by using the iterative cross entropy minimization estimation. The Korean Communications in Statistics, Vol. 12, 125-137   과학기술학회마을   DOI   ScienceOn
6 Ordonez, C. (2003). Clustering binary data streams with K-means. In 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery
7 Rand, W.M. (1971). Objective criteria for the evaluation of clustering methods. Joumal of the American Statistical Association, Vol. 66, 846-850   DOI
8 Affi, A.A. and Clark, V. (1990). Computer-Aided Multivariate Analysis. Van Nostrand Reinhold Company, New York
9 Asparoukhov, O.K. and Krzanowski, W.J. (2001). A comparison of discriminant procedures for binary variables. Computational Statistics & Data Analysis, Vol. 38, 139-160   DOI   ScienceOn
10 Chae, S.S., DuBien J.L. and Warde, W.D. (2006). A method of predicting the number of clusters using Rand's statistic. Computational Statistics & Data Analysis, Vol. 50, 3531-3546   DOI   ScienceOn
11 Chae, S.S. and Kim, J.I. (2005). Cluster analysis using principal coordinates for binary data. The Korean Communications in Statistics, Vol. 12, 683-696   과학기술학회마을   DOI   ScienceOn
12 DuBien, J.L. and Warde, W.D. (1987). A comparison of agglomerative cluster -ing methods with respect to noise. Communications in Statistics, Theory and Method, Vol. 16, 1433-1460   DOI   ScienceOn
13 Everitt, B. (1993). Cluster Analysis. 3rd edition, John Wiley & Sons
14 Gowda, K.C. and Diday, E. (1991). Symbolic clustering using a new dis simi -larity measures. Pattern Recognition, Vol. 24, 567-578   DOI   ScienceOn
15 Gower, J.C. (1966). Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika, Vol. 53, 325-338   DOI
16 Gower, J.C. (1971). A general coefficient of similarity and some of its properties. Biometrics, Vol. 27, 857-871   DOI   ScienceOn