[KSCI] Korea Science Citation Index Service

Feature Weighting in Projected Clustering for High Dimensional Data

Park, Jong-Soo (성신여자대학고 컴퓨터정보학부)

Publication Information

Journal of KIISE:Databases / v.32, no.3, 2005 , pp. 228-242 More about this Journal

Abstract

The projected clustering seeks to find clusters in different subspaces within a high dimensional dataset. We propose an algorithm to discover near optimal projected clusters without user specified parameters such as the number of output clusters and the average cardinality of subspaces of projected clusters. The objective function of the algorithm computes projected energy, quality, and the number of outliers in each process of clustering. In order to minimize the projected energy and to maximize the quality in clustering, we start to find best subspace of each cluster on the density of input points by comparing standard deviations of the full dimension. The weighting factor for each dimension of the subspace is used to get id of probable error in measuring projected distances. Our extensive experiments show that our algorithm discovers projected clusters accurately and it is scalable to large volume of data sets.

Keywords

split; merge; clustering; projected clustering; weighting; algorithm;

Citations & Related Records

Times Cited By KSCI : 1 (Citation Analysis)

Reference
Cited By KSCI

1	T. Zhang, R. Ramakrishnan, and M. Linvy, 'BIRCH: An Efficient Data Clustering Method for Large Databases,' In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 103-114, 1996 DOI
2	W. H. Beyer, CRC Standard Mathematical Tables, 28th Edition, CRC Press, 1987
3	심정욱, 손영숙, 백장선 역, 수리통계학, 제4판, 자유아카데미, 1999년
4	C. C. Aggarwal and P. S. Yu, 'Finding generalized projected clusters in high dimensional spaces,' In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 70-81, 2000 DOI
5	C. M. Procopiuc, M. Jones, P. K. Agarwal, and T. M. Murali, 'A monte carlo algorithm for fast projective clustering,' In Proceedings of the ACM SIGMOD International Conference on Management of Data, 2002 DOI
6	M. L. Yiu and N. Mamoulis, 'Frequent-Pattern based Iterative Projected Clustering,' In Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM), Melbourn, Florida, USA, November 2003
7	R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, 'Automatic subspace clustering of high dimensional data for data mining applications,' In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 94-105, 1998 DOI
8	L. Parsons, E. Haque, and H. Liu, 'Subspace Clustering for High Dimensional Data: A Review,' ACM SIGKDD Explorations, Vol. 6, Issue 1, pp. 90-105, June 2004 DOI
9	C. C. Aggarwal, C. M. Procopiuc, J. L. Wolf, P. S. Yu, and J. S. Park, 'Fast Algorithms for Projected Clustering,' In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 61-72, Philadelphia, PA, June 1-3, 1999 DOI
10	박종수, 김도형, '고 차원 데이터를 부분차원 클러스터링하는 효과적인 알고리즘', 정보처리학회 논문지 D, 10-D권, 3호, pp.417-426, June 2003 과학기술학회마을 DOI
11	A. K. Jain, M. N. Murty and P. J. Flynn, 'Data clustering: a review', ACM Computing Surveys, 31(3):264-323, 1999 DOI ScienceOn
12	J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, San Francisco, CA, 2001

KSCI

Feature Weighting in Projected Clustering for High Dimensional Data 고차원 데이타에 대한 투영 클러스터링에서 특성 가중치 부여

Feature Weighting in Projected Clustering for High Dimensional Data