An Incremental Similarity Computation Method in Agglomerative Hierarchical Clustering

  • Jung, Sung-young (Machine Intelligence Group, LG Electronics Institute of Technology) ;
  • Kim, Taek-soo (Machine Intelligence Group, LG Electronics Institute of Technology)
  • Published : 2001.12.01

Abstract

In the area of data clustering in high dimensional space, one of the difficulties is the time-consuming process for computing vector similarities. It becomes worse in the case of the agglomerative algorithm with the group-average link and mean centroid method, because the cluster similarity must be recomputed whenever the cluster center moves after the merging step. As a solution of this problem, we present an incremental method of similarity computation, which substitutes the scalar calculation for the time-consuming calculation of vector similarity with several measures such as the squared distance, inner product, cosine, and minimum variance. Experimental results show that it makes clustering speed significantly fast for very high dimensional data.

Keywords

References

  1. Linear Algebra with applications(third edition) Leon, S. J.
  2. Selection of relevant features and examples in machine learning Blum, A. L.;P. Langley
  3. Machine Learning Proc. of 13th International Conference Toward Optimal Feature Selection Koller, D.;M. Sahami
  4. ACM SIGMOD Fast Algorithms for Projected Clustering Aggarwal, C. C.;C. Procopiuc
  5. Information Retrieval: Data Structures & Algorithms Frakes, W. B.;Baeza-Yates, R.
  6. American Statistical Association v.58 no.301 Hierarchical Grouping to Optimize an Objective Function Ward, J. H.
  7. EachMovie collaborative filtering data set McJones, P.
  8. Proc. of the 14th Conf. on Uncertainty in Artificial Intelligence Empirical Analysis of Predictive Algorithms for Collaborative Filtering Breese, J. S.;D. Heckerman;C. Kadie
  9. ACM SIGIR'99 Workshop on Recommender Systems Memory-Based Weighted-Majority Prediction for Recommender Systems Delgado, J.;N. Ishii
  10. IEEE Transaction on Knowledge and Data Engineering v.8 no.2 On-line Clustering Boughettaya, A.
  11. Educational and Physiological Measurement Application of an Hierarchical Grouping Procedure to a Problem of Grouping Profiles Ward, J. H.;E. H. Marion
  12. Information Processing & Management v.22 no.6 Implementing Agglomerative Hierarchic Clustering Algorithms for use in Document Retireval Voorhees, E. M.