A Cell-based Clustering Method for Large High-dimensional Data in Data Mining

Jin, Du-Seok;Chang, Jae-Woo;

한국정보과학회논문지:데이타베이스 (Journal of KIISE:Databases)

제28권4호
/
Pages.558-567
/
2001
/
1229-7739(pISSN)

한국정보과학회 (Korean Institute of Information Scientists and Engineers)

데이타마이닝에서 고차원 대용량 데이타를 위한 셀-기반 클러스터 링 방법

A Cell-based Clustering Method for Large High-dimensional Data in Data Mining

진두석 (전북대학교 컴퓨터공학과) ;
장재우 (전북대학교 컴퓨터공학과)

Jin, Du-Seok (Dept.of Computer Engineering, Chonbuk National University) ;
Chang, Jae-Woo (Dept.of Computer Engineering, Chonbuk National University)

발행 : 2001.12.01

PDF

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

최근 데이타마이닝 응용분야에서는 고차원 대용량 데이타가 요구되고 있다. 그러나 기존의 대부분의 데이타마이닝을 위한 알고리즘들은 소위 차원의 저주(dimensionality curse)[1] 문제점과 이용 가 능한 메모리의 한계 때문에 고차원 대용량 데이타에는 비효율적이다. 따라서, 본 논문에서는 이러한 문제 점을 해결하기 위해서 셀-기반 클러스터링 방법을 제안한다. 제안하는 진-기반 클러스터링 방법은 고차원 대용량 데이타를 효율적으로 처리하기 위한 셀 구성 알고리즘과 필터링에 기반한 저장인덱스 구조를 제공 한다. 본 논문에서 제안한 셀-기반 클러스터링 방법을 (CLQUE 방법과 클러스터링 시간, 정확율, 검색시 간 관점에서 성능을 비교한다. 마지막으로, 실험결과 제안하는 셀-기반 클러스터링 방법이 CLIQUE 방법 에 비해 성능이 우수함을 보인다

Recently, data mining applications require a large amount of high-dimensional data Most algorithms for data mining applications however, do not work efficiently of high-dimensional large data because of the so-called curse of dimensionality[1] and the limitation of available memory. To overcome these problems, this paper proposes a new cell-based clustering which is more efficient than the existing algorithms for high-dimensional large data, Our clustering method provides a cell construction algorithm for dealing with high-dimensional large data and a index structure based of filtering .We do performance comparison of our cell-based clustering method with the CLIQUE method in terms of clustering time, precision, and retrieval time. Finally, the results from our experiment show that our cell-based clustering method outperform the CLIQUE method.

키워드

참고문헌

Berchtold S., Bohm C., Keim D. and Kriegel H.-P., 'A Cost Model for Nearest Neighbor Serarch in High-Dimensional Data Space,' ACM PODS Symposium on Principles of Databases Systems, Tucson, Arizona, 1997, pp.78-86 https://doi.org/10.1145/263661.263671
Han J. and Kamber M., 'Data Mining : Concepts and Techniques.' Morgan Kaufmann, 2000
Ng R.T. and Han J., 'Efficient and Effective Clustering Methods for Spatial Data Mining,' Proc. 20th Int. Conf. on Very Large Data Bases, 1994, pp.144-155
Kaufman L.. and Rousseeuw P.J.. 'Finding Groups in Data : An Introduction to Cluster Analysis.' John Wiley & Sons, 1990
Zhang T., Rarnakrishnan Rand Linvy M., 'BIRCH : An Efficient Data Clustering Method for Very Large Databases.' Proc. ACM SIGMOD Int. Conf. on Management of Data, 1996, pp. 103-114 https://doi.org/10.1145/233269.233324
Ester M., Kriegel H.-P., Sander J. and Xu X., 'A Density Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,' Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, 1996, pp.226-231
Ester M., Kriegel H.-P., Sander J. and Xu X., 'Density-Connected Set and Their Application for Trend Detection in Spatial Databases,' Proc. 3rd Int. Conf. on Knowledge Discovery and Data Mining, 1997, pp.10-15
Wang W., Yang J. and Muntz R., 'STING: A Statistical Information Grid Approach to Spatial Data Mining,' Proc, 23rd Int. Conf. on Very Large Data Bases, 1997, pp.186- 195
Agrawal R, Gehrke J., Gunopulos D. and Raghavan P., 'Automatic Subspace Clustering of High Dimensional Data Mining Applications,' Proc. ACM SIGMOD Int. Conf. on Management of Data, 1998, pp.94-105 https://doi.org/10.1145/276304.276314
Breiman L., Friedman J. H., Olshen R. A. and Stone C. J., 'Classification and Regression Trees,' Wadsworth, Belmont, 1984
http://WWW.almaden.ibm.com/cs/quest

한국정보과학회논문지:데이타베이스 (Journal of KIISE:Databases)

데이타마이닝에서 고차원 대용량 데이타를 위한 셀-기반 클러스터 링 방법

A Cell-based Clustering Method for Large High-dimensional Data in Data Mining

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)