[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3745/KIPSTD.2003.10D.3.417

An Effective Algorithm for Subdimensional Clustering of High Dimensional Data

Park, Jong-Soo (성신여자대학교 컴퓨터정보학부)
Kim, Do-Hyung (성신여자대학교 컴퓨터정보학부)

Publication Information

The KIPS Transactions:PartD / v.10D, no.3, 2003 , pp. 417-426 More about this Journal

Abstract

The problem of finding clusters in high dimensional data is well known in the field of data mining for its importance, because cluster analysis has been widely used in numerous applications, including pattern recognition, data analysis, and market analysis. Recently, a new framework, projected clustering, to solve the problem was suggested, which first select subdimensions of each candidate cluster and then each input point is assigned to the nearest cluster according to a distance function based on the chosen subdimensions of the clusters. We propose a new algorithm for subdimensional clustering of high dimensional data, each of the three major steps of which partitions the input points into several candidate clutters with proper numbers of points, filters the clusters that can not be useful in the next steps, and then merges the remaining clusters into the predefined number of clusters using a closeness function, respectively. The result of extensive experiments shows that the proposed algorithm exhibits better performance than the other existent clustering algorithms.

Keywords

Data Mining; Clustering; High Dimensional Data; Algorithm;

Citations & Related Records

Times Cited By KSCI : 2 (Citation Analysis)

Reference
Cited By KSCI

1	C. C. Aggarwal, C. Procopiuc, J. L. Wolf, P. S. Yu and J. S. Park, 'Fast Algorithms for Projected Clustering,' In Proceedings of the ACM SIGMOD International Conference on Management of Data, PP.61-72, 1999 DOI
2	C. C. Aggarwal and P. S. Yu, 'Finding generalized projected clusters in high dimensional spaces,' In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp.70-81, 2000 DOI
3	C. C. Aggarwal and P. S. Yu, 'Finding generalized projected clusters in high dimensional spaces,' IEEE TKDE, Vol.14, No.2, pp.210-225, 2002
4	R. Agrawal, J. Gehrke, D. Gunopulos, P. Raghavan, 'Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications,' In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp.94-105, 1998 DOI
5	M. Ankerst, M. M. Breunig, H.-P. Kriegel and J. Sander, 'OPTICS : Ordering Points to Identify the Clustering Structure,' In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp.49-60, 1999 DOI
6	M. Ester, H. P. Kriegel, J. Sander and X. Xu, 'A density based algorithm for discovering clusters in large databases,' In Proceedings of 1996 International Conference on Knowledge Discovery and Data Mining(KDD'96), pp.226-231, 1996
7	A. K. Jain, M. N. Murty and P. J. Flynn, 'Data Clustering : A Review,' ACM Computing Surveys, Vol.31, No.3, pp.264-323, 1999 DOI ScienceOn
8	G. Karypis, E. H. Han and V. Kumar, 'CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling,' COMPUTER, 32, pp.68-75, 1999 DOI ScienceOn
9	R. Kohavi and D. Sommerfield, 'Feature Subset Selection Using the Wrapper Method : Overfitting and Dynamic Search Space Topology,' In Proceedings of the First International Conference on Knowledge Discovery and Data Mining, 1995
10	H. Liu and H. Motoda, Feature Extraction, Construction and Selection : A Data Mining Perspective, Kluwer Academic Publishers, Boston, 1998
11	R. Ng and J. Han, 'Efficient and Effective Clustering Methods for Spatial Data Mining,' In Proceedings of the 20th VLDB Conference, pp.144-155, 1994
12	R. Ng and J. Han, 'Efficient and Effective Clustering Methods for Spatial Data Mining,' IEEE TKDE Vol.14, No.5, pp.1003-1016, 2002
13	C. M. Procopiuc, M. Jones, P. K. Agarwal and T. M. Murali, 'A Monte Carlo Algorithm for Fast Projective Clustering,' In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp.418-427, 2002 DOI
14	T. Zhang, R. Ramakrishnan and M. Linvy, 'BIRCH : An Efficient Data Clustering Method for Large Databases,' In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp.103-114, 1996 DOI
15	S. Guha, R. Rastogi and K. Shim, 'CURE: An Efficient Clustering Algorithm for Large Databases,' In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp.73-84, 1998 DOI
16	J. Han and M. Kamber, Data Mining : Concepts and Techniques, Morgan Kaufmann Publishers, San Francisco, CA, 2001
17	A. Hinneburg and D. Keim, 'Optimal Grid-Clustering : Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering,' In Proceedings of the 25th VLDB Conference, pp.506-517, 1999

1	Clustering Technique Using Relevance of Data and Applied Algorithms / [Han Woo-Yeon;Nam Mi-Young;Rhee PhillKyu;] / The KIPS Transactions:PartB
2	Feature Weighting in Projected Clustering for High Dimensional Data / [Park, Jong-Soo;] / Journal of KIISE:Databases

KSCI

An Effective Algorithm for Subdimensional Clustering of High Dimensional Data 고차원 데이터를 부분차원 클러스터링하는 효과적인 알고리즘

An Effective Algorithm for Subdimensional Clustering of High Dimensional Data