A Fuzzy Clustering Algorithm for Clustering Categorical Data

Kim, Dae-Won;Lee, Kwang-H.;

doi:10.5391/JKIIS.2003.13.6.661

한국지능시스템학회논문지 (Journal of the Korean Institute of Intelligent Systems)

제13권6호
/
Pages.661-666
/
2003
/
1976-9172(pISSN)
/
2288-2324(eISSN)

한국지능시스템학회 (Korean Institute of Intelligent Systems)

DOI QR Code

범주형 데이터의 분류를 위한 퍼지 군집화 기법

A Fuzzy Clustering Algorithm for Clustering Categorical Data

김대원 (한국과학기술원 전산학과) ;
이광형 (한국과학기술원 전산학과)

발행 : 2003.12.01

https://doi.org/10.5391/JKIIS.2003.13.6.661 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

본 논문에서는 범주형 데이터의 분류를 위한 새로운 기법을 제시한다. 기존의 대표적인 퍼지 군집화 방법인 k-modes 알고리즘과 fuzzy k-modes 알고리즘은 군집의 중심을 단일 값으로 표현하고, 군집에 속하는 데이터의 빈도 수에 기반한 중신 갱신 기법을 사용하였다. 이와 같은 기존의 방법들은 분류의 경계가 모호한 데이트를 군집화할 경우, 알고리즘의 각 단계에서 발생하는 분류의 에러를 보정하지 못해 최종적으로 지역해에 빠지는 단점이 있다. 이를 극복하기 위해 본 논문에서는 군집 중심을 퍼지 집합을 이용하여 정의한다. 퍼지 군집 중심은 주어진 데이터와 군집간의 거리 관계를 퍼지 값을 이용해 표현하며, 각 군집의 중심은 데이터의 소속 정도 값을 이용해 갱신된다. 이와 같은 퍼지 중심 표현기법을 도입하여 범주형 데이터의 분류 시에 보다 세밀한 결정을 내림으로써, 인접한 군집들의 경계에서 발생하는 불확실성을 최소화한다. 기존의 대표적인 방법들과의 비교실험을 수행함으로써 제안한 방법의 성능을 검증하였다.

In this paper, the conventional k-modes and fuzzy k-modes algorithms for clustering categorical data is extended by representing the clusters of categorical data with fuzzy centroids instead of the hard-type centroids used in the original algorithm. The hard-type centroids of the traditional algorithms had difficulties in dealing with ambiguous boundary data, which might be misclassified and lead to thelocal optima. Use of fuzzy centroids makes it possible to fully exploit the power of fuzzy sets in representing the uncertainty in the classification of categorical data. The distance measure between data and fuzzy centroids is more precise and effective than those of the k-modes and fuzzy k-modes. To test the proposed approach, the proposed algorithm and two conventional algorithms were used to cluster three categorical data sets. The proposed method was found to give markedly better clustering results.

키워드

참고문헌

J.C. Gower, "A general coefficient of similarity and some of its properties", BioMetrics, vol. 27, pp. 857-874, 1971. https://doi.org/10.2307/2528823
K.C. Gowda, E. Diday, "Symbolic clustering using a new dissimilarity measure", Pattern Recognition, vol. 24, no. 6, pp. 567-578, 1991. https://doi.org/10.1016/0031-3203(91)90022-W
L. Kaufman, P.J. Rousseeuw, Finding Groups in Data-An Introduction to Cluster Analysis. New York:Wiely Publishers, 1990.
R.S. Michalski, R.E. Stepp, "Automated construction of classification: Conceptual clustering versus numerical taxonomy", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 5, pp. 396-410, 1983. https://doi.org/10.1109/TPAMI.1983.4767409
M.A. Woodbury, J.A. Clive, "Clinical pure types as a fuzzy partition", J. Cybern., vol 4-3, pp. 111-121, 1974.
Z. Huang, "Extensions to the k-modes algorithm for clustering large data sets with categorical values", Data Mining Knowledge Discovery, vol. 2, no. 3, 1998.
Z. Huang, M.K. Ng, "A fuzzy k-modes algorithm for clustering categorical data", IEEE Transactions on Fuzzy Systems, vol. 7, no. 4, 1999.
A.K. Jain, R.C. Dubes, Algorithms for Clustering, NJ:Prentice-Hall, 1998.
A.K. Jain, M.N. Murty, P.J. Flynn, "Data clustering: a review", ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, 1999. https://doi.org/10.1145/331499.331504
J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum: New York, 1981.
J.C. Bezdek, et aI, Fuzzy Models and Algorithms for Pattern Recognition and Image Processing, Boston:Kluwer Academy Publishers, 1999.
T. Kohonen, Content-Addressable Memories, Berlin:Springer-Verlag, 1980.
J.R Quilan, R. Quilan, C4.5: Programs for Machine Learning, Morgan Kaufmann, 1992.
l.H. Witten, B.A. MacDonald, "Using concept learning for knowledge acquistion", International Journal of Man-Machine Studies, vol. 27, pp. 349-370, 1988.
R. Forsyth, Zoo database in the UCI KDD Archive. [Available online] http://kdd.ics.uci.edu/, 1990.
H. Lee-Kwang, K.M. Lee, "Fuzzy hypergraph and fuzzy partition", IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, vol. 25, no. 2, pp. 196-201, 1995. https://doi.org/10.1109/21.362951

한국지능시스템학회논문지 (Journal of the Korean Institute of Intelligent Systems)

범주형 데이터의 분류를 위한 퍼지 군집화 기법

A Fuzzy Clustering Algorithm for Clustering Categorical Data

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)