Browse > Article
http://dx.doi.org/10.3745/KIPSTD.2003.10D.3.417

An Effective Algorithm for Subdimensional Clustering of High Dimensional Data  

Park, Jong-Soo (성신여자대학교 컴퓨터정보학부)
Kim, Do-Hyung (성신여자대학교 컴퓨터정보학부)
Abstract
The problem of finding clusters in high dimensional data is well known in the field of data mining for its importance, because cluster analysis has been widely used in numerous applications, including pattern recognition, data analysis, and market analysis. Recently, a new framework, projected clustering, to solve the problem was suggested, which first select subdimensions of each candidate cluster and then each input point is assigned to the nearest cluster according to a distance function based on the chosen subdimensions of the clusters. We propose a new algorithm for subdimensional clustering of high dimensional data, each of the three major steps of which partitions the input points into several candidate clutters with proper numbers of points, filters the clusters that can not be useful in the next steps, and then merges the remaining clusters into the predefined number of clusters using a closeness function, respectively. The result of extensive experiments shows that the proposed algorithm exhibits better performance than the other existent clustering algorithms.
Keywords
Data Mining; Clustering; High Dimensional Data; Algorithm;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 C. C. Aggarwal, C. Procopiuc, J. L. Wolf, P. S. Yu and J. S. Park, 'Fast Algorithms for Projected Clustering,' In Proceedings of the ACM SIGMOD International Conference on Management of Data, PP.61-72, 1999   DOI
2 C. C. Aggarwal and P. S. Yu, 'Finding generalized projected clusters in high dimensional spaces,' In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp.70-81, 2000   DOI
3 C. C. Aggarwal and P. S. Yu, 'Finding generalized projected clusters in high dimensional spaces,' IEEE TKDE, Vol.14, No.2, pp.210-225, 2002
4 R. Agrawal, J. Gehrke, D. Gunopulos, P. Raghavan, 'Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications,' In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp.94-105, 1998   DOI
5 M. Ankerst, M. M. Breunig, H.-P. Kriegel and J. Sander, 'OPTICS : Ordering Points to Identify the Clustering Structure,' In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp.49-60, 1999   DOI
6 M. Ester, H. P. Kriegel, J. Sander and X. Xu, 'A density based algorithm for discovering clusters in large databases,' In Proceedings of 1996 International Conference on Knowledge Discovery and Data Mining(KDD'96), pp.226-231, 1996
7 A. K. Jain, M. N. Murty and P. J. Flynn, 'Data Clustering : A Review,' ACM Computing Surveys, Vol.31, No.3, pp.264-323, 1999   DOI   ScienceOn
8 G. Karypis, E. H. Han and V. Kumar, 'CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling,' COMPUTER, 32, pp.68-75, 1999   DOI   ScienceOn
9 R. Kohavi and D. Sommerfield, 'Feature Subset Selection Using the Wrapper Method : Overfitting and Dynamic Search Space Topology,' In Proceedings of the First International Conference on Knowledge Discovery and Data Mining, 1995
10 H. Liu and H. Motoda, Feature Extraction, Construction and Selection : A Data Mining Perspective, Kluwer Academic Publishers, Boston, 1998
11 R. Ng and J. Han, 'Efficient and Effective Clustering Methods for Spatial Data Mining,' In Proceedings of the 20th VLDB Conference, pp.144-155, 1994
12 R. Ng and J. Han, 'Efficient and Effective Clustering Methods for Spatial Data Mining,' IEEE TKDE Vol.14, No.5, pp.1003-1016, 2002
13 C. M. Procopiuc, M. Jones, P. K. Agarwal and T. M. Murali, 'A Monte Carlo Algorithm for Fast Projective Clustering,' In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp.418-427, 2002   DOI
14 T. Zhang, R. Ramakrishnan and M. Linvy, 'BIRCH : An Efficient Data Clustering Method for Large Databases,' In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp.103-114, 1996   DOI
15 S. Guha, R. Rastogi and K. Shim, 'CURE: An Efficient Clustering Algorithm for Large Databases,' In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp.73-84, 1998   DOI
16 J. Han and M. Kamber, Data Mining : Concepts and Techniques, Morgan Kaufmann Publishers, San Francisco, CA, 2001
17 A. Hinneburg and D. Keim, 'Optimal Grid-Clustering : Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering,' In Proceedings of the 25th VLDB Conference, pp.506-517, 1999