SVM based Clustering Technique for Processing High Dimensional Data

Kim, Man-Sun;Lee, Sang-Yong;

doi:10.5391/JKIIS.2004.14.7.816

Journal of the Korean Institute of Intelligent Systems (한국지능시스템학회논문지)

Volume 14 Issue 7
/
Pages.816-820
/
2004
/
1976-9172(pISSN)
/
2288-2324(eISSN)

Korean Institute of Intelligent Systems (한국지능시스템학회)

DOI QR Code

SVM based Clustering Technique for Processing High Dimensional Data

고차원 데이터 처리를 위한 SVM기반의 클러스터링 기법

김만선 (한국표준과학연구원(KRISS) 정보전산그룹, 공주대학교 컴퓨터공학과) ;
이상용 (공주대학교 정보통신공학부)

Published : 2004.12.01

https://doi.org/10.5391/JKIIS.2004.14.7.816 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Clustering is a process of dividing similar data objects in data set into clusters and acquiring meaningful information in the data. The main issues related to clustering are the effective clustering of high dimensional data and optimization. This study proposed a method of measuring similarity based on SVM and a new method of calculating the number of clusters in an efficient way. The high dimensional data are mapped to Feature Space ones using kernel functions and then similarity between neighboring clusters is measured. As for created clusters, the desired number of clusters can be got using the value of similarity measured and the value of Δd. In order to verify the proposed methods, the author used data of six UCI Machine Learning Repositories and obtained the presented number of clusters as well as improved cohesiveness compared to the results of previous researches.

클러스터링은 데이터 집합을 유사한 데이터 개체들의 클러스터들로 분할하여 데이터 속에 존재하는 의미 있는 정보를 얻는 과정이다. 클러스터링의 주요 쟁점은 고차원 데이터를 효율적으로 클러스터링하는 것과 최적화 문제를 해결하는 것이다. 본 논문에서는 SVM(Support Vector Machines)기반의 새로운 유사도 측정법과 효율적으로 클러스터의 개수를 생성하는 방법을 제안한다. 고차원의 데이터는 커널 함수를 이용해 Feature Space로 매핑시킨 후 이웃하는 클러스터와의 유사도를 측정한다. 이미 생성된 클러스터들은 측정된 유사도 값과 Δd 임계값에 의해서 원하는 클러스터의 개수를 얻을 수 있다. 제안된 방법을 검증하기 위하여 6개의 UCI Machine Learning Repository의 데이터를 사용한 결과, 제시된 클러스터의 개수와 기존의 연구와 비교하여 향상된 응집도를 얻을 수 있었다.

Keywords

References

Tian Zhang, Raghu Ramakrishnan, and Miron, 'Birch : an efficient data clustering method for very large database,' the ACM SIGMOD Conference on Management of Data, Montreal, Canada, June, 1996
R.. Pyle, DE. Hart, 'Pattern Classfication and Scene Analysis,' A Wiley-Interscience Publication, NewYork, 1973
Raymond T.Ng,Jiawei Han, 'Efficient and Effective Clustering Methods for Spatial Data Mining', Proc. of 20th Int.Conf. on VLDB, pp.144-155, 1994
송은정, 강인수, 김태원, 이기준, '클러스터링 분석에의한 공간 데이터마이닝 방법', 한국정보과학회 가을학술발표논문집(2), 1998
M.Ester, H. Kriegel, Jorg Sander, and Xiaowei Xu, 'A density-based algorithm for discovering clusters in large spatial database with noise', Proc. of Int .Conf. on Knowledge Discovery and Data Mining, 1996
Tian Zhang, Raghu Ramakrishnan, and Miron Livny, 'BIRCH:An Efficient Data Clustring Method for Very Large Databases', Proc. of ACMSIGMOD Int. Conf. on Management of Data, pp.103-114, 1996
이혜명, 박영배, '점진적 프로젝션을 이용한 고차원 글러스터링 기법', 한국정보과학회논문지:데이타베이스 Vol.28.No.4, pp.568-576, 2001
장미희,이혜명, 박영배, '고차원 데이터에서 2차원프로젝션을 이용한 클러스터링', 한국정보과학회 가을학술발표논문집 Vol.28.No.2, 2001
http://www.kernel-machines.org
http://svm.cs.rhbnc.ac.uk
http://www.ics.uci.edu/

Journal of the Korean Institute of Intelligent Systems (한국지능시스템학회논문지)

SVM based Clustering Technique for Processing High Dimensional Data

고차원 데이터 처리를 위한 SVM기반의 클러스터링 기법

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)