Browse > Article
http://dx.doi.org/10.5391/JKIIS.2003.13.6.661

A Fuzzy Clustering Algorithm for Clustering Categorical Data  

Kim, Dae-Won (한국과학기술원 전산학과)
Lee, Kwang-H. (한국과학기술원 전산학과)
Publication Information
Journal of the Korean Institute of Intelligent Systems / v.13, no.6, 2003 , pp. 661-666 More about this Journal
Abstract
In this paper, the conventional k-modes and fuzzy k-modes algorithms for clustering categorical data is extended by representing the clusters of categorical data with fuzzy centroids instead of the hard-type centroids used in the original algorithm. The hard-type centroids of the traditional algorithms had difficulties in dealing with ambiguous boundary data, which might be misclassified and lead to thelocal optima. Use of fuzzy centroids makes it possible to fully exploit the power of fuzzy sets in representing the uncertainty in the classification of categorical data. The distance measure between data and fuzzy centroids is more precise and effective than those of the k-modes and fuzzy k-modes. To test the proposed approach, the proposed algorithm and two conventional algorithms were used to cluster three categorical data sets. The proposed method was found to give markedly better clustering results.
Keywords
퍼지 클러스터링;범주형 데이터;퍼지 중심;
Citations & Related Records
연도 인용수 순위
  • Reference
1 J.R Quilan, R. Quilan, C4.5: Programs for Machine Learning, Morgan Kaufmann, 1992.
2 T. Kohonen, Content-Addressable Memories, Berlin:Springer-Verlag, 1980.
3 l.H. Witten, B.A. MacDonald, "Using concept learning for knowledge acquistion", International Journal of Man-Machine Studies, vol. 27, pp. 349-370, 1988.
4 K.C. Gowda, E. Diday, "Symbolic clustering using a new dissimilarity measure", Pattern Recognition, vol. 24, no. 6, pp. 567-578, 1991.   DOI   ScienceOn
5 H. Lee-Kwang, K.M. Lee, "Fuzzy hypergraph and fuzzy partition", IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, vol. 25, no. 2, pp. 196-201, 1995.   DOI   ScienceOn
6 L. Kaufman, P.J. Rousseeuw, Finding Groups in Data-An Introduction to Cluster Analysis. New York:Wiely Publishers, 1990.
7 J.C. Bezdek, et aI, Fuzzy Models and Algorithms for Pattern Recognition and Image Processing, Boston:Kluwer Academy Publishers, 1999.
8 Z. Huang, M.K. Ng, "A fuzzy k-modes algorithm for clustering categorical data", IEEE Transactions on Fuzzy Systems, vol. 7, no. 4, 1999.
9 R. Forsyth, Zoo database in the UCI KDD Archive. [Available online] http://kdd.ics.uci.edu/, 1990.
10 J.C. Gower, "A general coefficient of similarity and some of its properties", BioMetrics, vol. 27, pp. 857-874, 1971.   DOI   ScienceOn
11 Z. Huang, "Extensions to the k-modes algorithm for clustering large data sets with categorical values", Data Mining Knowledge Discovery, vol. 2, no. 3, 1998.
12 M.A. Woodbury, J.A. Clive, "Clinical pure types as a fuzzy partition", J. Cybern., vol 4-3, pp. 111-121, 1974.
13 A.K. Jain, M.N. Murty, P.J. Flynn, "Data clustering: a review", ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, 1999.   DOI   ScienceOn
14 A.K. Jain, R.C. Dubes, Algorithms for Clustering, NJ:Prentice-Hall, 1998.
15 R.S. Michalski, R.E. Stepp, "Automated construction of classification: Conceptual clustering versus numerical taxonomy", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 5, pp. 396-410, 1983.   DOI   ScienceOn
16 J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum: New York, 1981.