Clustering Technique for Sequence Data Sets in Multidimensional Data Space

다차원 데이타 공간에서 시뭔스 데이타 세트를 위한 클러스터링 기법

  • 이석룡 (한국과학기술원 정보및통신공학과) ;
  • 임동혁 (한국과학기술원 전산학과) ;
  • 정진완 (한국과학기술원 전산학과)
  • Published : 2001.12.01

Abstract

The continuous data such as video streams and voice analog signals can be modeled as multidimensional data sequences(MDS's) in the feature space, In this paper, we investigate the clustering technique for multidimensional data sequence, Each sequence is represented by a small number by hyper rectangular clusters for subsequent storage and similarity search processing. We present a linear clustering algorithm that guarantees a predefined level of clustering quality and show its effectiveness via experiments on various video data sets.

비디오 스트림이나 음성 아날로그 신호와 같은 연속된 데이타는 특징 공간(feature space)에서 다차원 데이타 시퀀스(multidimensional data sequence)로 모델링될 수 있다. 본 논문에서는 이러한 다차원 데 이타 시퀀스의 효과적인 클러스터링 기법에 대하여 연구한다. 각 시퀀스는 차후의 저장 및 유사성 검색 (similarity search)을 효율적으로 실행하기 위하여 소수 개의 하이퍼 사각형 (hyper-rectangle) 형태의 클러스터로 표현된다. 본 논문에서는 사전에 정의된 수준의 클러스터링 품질을 보장하는 선형 복잡도를 갖는 클러스터링 알고리즘을 제시하고, 다양한 비디오 데이타에 관한 실험을 통하여 알고리즘의 적합성을 보여준다.

Keywords

References

  1. S. L. Lee, S. J. Chun, D. H. Kim, J. H. Lee, and C. W. Chung, 'Similarity search for multidimensional data sequences,' Proceedings of IEEE Int'l Conference on Data Engineering, pp. 599-608, 2000 https://doi.org/10.1109/ICDE.2000.839473
  2. A.Guttman, 'Rr-trees: a dynamic index structure for spatial searching,' Proceedings of ACM SIGMOD Int'l Conference on Management of Data, pp. 47-57, 1984 https://doi.org/10.1145/602259.602266
  3. N. Beckmann, H. Kriegel, R. Schneider. and B. Seeger, 'The $R^*$-tree: an efficient and robust access method for points and rectangles,' Proceedings of ACM SIGMOD Int'l Conjerence on Management of Data, pp. 322-331, 1990 https://doi.org/10.1145/93597.98741
  4. S. Berchtold, D. Keim, and H. Kriegel, 'The X -tree: an index structure for high-dimensional data,' Proceedings of Int'l Conference on Very Large Data Bases, pp. 28-39, 1996
  5. T. Sellis, N. Roussopoulos, and C. Faloutsos, 'The R+ tree: a dynamic index for multi-dimensional objects,' Proceedings of Int'I Conference on Very Large Data Bases, pp. 507-518, 1987
  6. R. T. Ng and J. Han, 'Efficient and effective clustering methods for spatial data mining,' Proceedings of Int'l Conference on Very Large Data Bases, pp. 144-155, 1994
  7. T. Zhang, R. Ramakrishnan, and M. Livny, 'BIRCH: An efficient data clustering method for very large databases,' Proceedings of ACM' SIGMOD Int'l Conference on Management of Data, pp. 103-114, 1996 https://doi.org/10.1145/233269.233324
  8. M. Ester, H. P. Kriegel, J. Sander, and X. Xu, 'A density-based algorithm for discovering clusters in large spatial databases with noise,' Int'l Conference on Knowledge Discovery in Databases and Data Mining, pp. 226-231. 1996
  9. R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, 'Automatic subspace clustering of high dimensional data for data mining applications,' Proceedings of ACM SIGMOD Int'l Conference on Management of Data, pp. 94-105, 1998 https://doi.org/10.1145/276304.276314
  10. S. Guha, R. Rastogi, and K. Shim, 'CURE: An efficient clustering algorithm for large databases,' Proceedings of ACM SIGMOD Int'l Conference on Management of Data, pp. 73-84, 1998 https://doi.org/10.1145/276304.276312
  11. C. C. Aggarwal, C. Procopiuc, J. L. Wolf, P. S. Yu, and J. S. Park, 'Fart algorithms for projected clustering,' Proceedings of AOv! SIGMOD Int'l Conference on Management qf Data, pp. 61-72, 1999
  12. C. Faloutsos, M. Ranganathan, and Y. Manolopoulos, 'Fast subsequence matching in time-series databases,' Proceedings of ACM SIGMOD Int'l Conference on Management of Data, pp. 419-429, 1994 https://doi.org/10.1145/191839.191925
  13. V. Kobla, D. Doermann, and C. Faloutsos, Video 'Trails: Representing and visualizing structure in video sequences,' Proceedings of ACM Multimedia, pp. 3:35-346, 1997 https://doi.org/10.1145/266180.266384