Browse > Article

Feature Weighting in Projected Clustering for High Dimensional Data  

Park, Jong-Soo (성신여자대학고 컴퓨터정보학부)
Abstract
The projected clustering seeks to find clusters in different subspaces within a high dimensional dataset. We propose an algorithm to discover near optimal projected clusters without user specified parameters such as the number of output clusters and the average cardinality of subspaces of projected clusters. The objective function of the algorithm computes projected energy, quality, and the number of outliers in each process of clustering. In order to minimize the projected energy and to maximize the quality in clustering, we start to find best subspace of each cluster on the density of input points by comparing standard deviations of the full dimension. The weighting factor for each dimension of the subspace is used to get id of probable error in measuring projected distances. Our extensive experiments show that our algorithm discovers projected clusters accurately and it is scalable to large volume of data sets.
Keywords
split; merge; clustering; projected clustering; weighting; algorithm;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 T. Zhang, R. Ramakrishnan, and M. Linvy, 'BIRCH: An Efficient Data Clustering Method for Large Databases,' In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 103-114, 1996   DOI
2 W. H. Beyer, CRC Standard Mathematical Tables, 28th Edition, CRC Press, 1987
3 심정욱, 손영숙, 백장선 역, 수리통계학, 제4판, 자유아카데미, 1999년
4 C. C. Aggarwal and P. S. Yu, 'Finding generalized projected clusters in high dimensional spaces,' In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 70-81, 2000   DOI
5 C. M. Procopiuc, M. Jones, P. K. Agarwal, and T. M. Murali, 'A monte carlo algorithm for fast projective clustering,' In Proceedings of the ACM SIGMOD International Conference on Management of Data, 2002   DOI
6 M. L. Yiu and N. Mamoulis, 'Frequent-Pattern based Iterative Projected Clustering,' In Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM), Melbourn, Florida, USA, November 2003
7 R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, 'Automatic subspace clustering of high dimensional data for data mining applications,' In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 94-105, 1998   DOI
8 L. Parsons, E. Haque, and H. Liu, 'Subspace Clustering for High Dimensional Data: A Review,' ACM SIGKDD Explorations, Vol. 6, Issue 1, pp. 90-105, June 2004   DOI
9 C. C. Aggarwal, C. M. Procopiuc, J. L. Wolf, P. S. Yu, and J. S. Park, 'Fast Algorithms for Projected Clustering,' In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 61-72, Philadelphia, PA, June 1-3, 1999   DOI
10 박종수, 김도형, '고 차원 데이터를 부분차원 클러스터링하는 효과적인 알고리즘', 정보처리학회 논문지 D, 10-D권, 3호, pp.417-426, June 2003   과학기술학회마을   DOI
11 A. K. Jain, M. N. Murty and P. J. Flynn, 'Data clustering: a review', ACM Computing Surveys, 31(3):264-323, 1999   DOI   ScienceOn
12 J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, San Francisco, CA, 2001