Browse > Article
http://dx.doi.org/10.3745/JIPS.2010.6.1.067

Approximate Clustering on Data Streams Using Discrete Cosine Transform  

Yu, Feng (Department of Computer Science, Southern Illinois University)
Oyana, Damalie (Department of Computer Science, Southern Illinois University)
Hou, Wen-Chi (Department of Computer Science, Southern Illinois University)
Wainer, Michael (Department of Computer Science, Southern Illinois University)
Publication Information
Journal of Information Processing Systems / v.6, no.1, 2010 , pp. 67-78 More about this Journal
Abstract
In this study, a clustering algorithm that uses DCT transformed data is presented. The algorithm is a grid density-based clustering algorithm that can identify clusters of arbitrary shape. Streaming data are transformed and reconstructed as needed for clustering. Experimental results show that DCT is able to approximate a data distribution efficiently using only a small number of coefficients and preserve the clusters well. The grid based clustering algorithm works well with DCT transformed data, demonstrating the viability of DCT for data stream clustering applications.
Keywords
Grid Density-Based Clustering; Approximate Cluster Analysis; Discrete Cosine Transform; Sampling; Data Reconstruction; Data Compression;
Citations & Related Records
연도 인용수 순위
  • Reference
1 C. C. Aggarwal, J. Han, J. Wang, P. Yu, "A Frame-work for Projected Clustering of High Dimension Data Streams," VLDB Conference, 2004.
2 C. C. Aggarwal, J. Han, J. Wang, P. Yu, "A Frame-work for Clustering Evolving Data Streams," VLDB Conference, 2003.
3 M. Ester, H. P. Kriegel., J. Sander, X. Xu.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise, Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD'96), Portland, OR, pp.226-231, 1996. (DBSCAN)
4 D. Fisher, "Iterative Optimization and Simplification of Hierarchical Clusterings," Journal of AI Research, Vol.4, pp.147-180, 1996.   DOI
5 Y. Lu, Y. Huang, "Mining Data Streams Using Clustering". Proc. 4th Int. Conf. on Machine Learning and Cybernetics, Giangzhou, pp.18-21, 2005.
6 J. Lee, D. Kim, and C. Chung, "Multi-Dimensional Selectivity Estimation Using Compressed Histogram Information," Proc. ACM SIGMOD Conf., pp. 205-214, 1999.   DOI
7 G. Medhat, M, M., Zaslavsky, A., and Krishnaswamy, S., "Towards an Adaptive Approach for Mining Data Streams in Resource Constrained Environments," Proc. of 6th Int. Conf. on Data Warehousing and Knowledge Discovery –- Industry Track (DaWak 2004), Zaragoza, Spain, September, Springer Verlag.
8 L. O'Callaghan et al. Streaming-Data Algorithms for High-Quality Clustering. ICDE Conference, 2002.   DOI
9 N. Park, W. Lee, Statistical Grid-Based Clustering over Data Streams, ACM SIGMOD Record, Vol.33, No.1, pp.32-37.   DOI   ScienceOn
10 V. Poosala, Y.E. Ioannidis, P.J. Haas, E.J. Shekita, "Improved Histograms for Selectivity Estimation of Range Predicates," ACM SIGMOD 1996.   DOI   ScienceOn
11 G. Strang, "The Discrete Cosine Transform". SLAM Review, Vol.41, No.1, pp.135-147, 1999.
12 S. Guha, R. Rastogi, K. Shim. CURE: An Efficient Clustering Algorithm for Large Databases. ACM SIGMOD Conference, 1998.   DOI
13 S. Guha, N. Mishra, R. Motwani, L. O'Callaghan, "Clustering Data Streams". IEEE FOCS Conference, 2000.   DOI
14 Z. Fu, J. Yang, W. Hu, T. Tan, "Mixture Clustering Using Multidimensional Histograms for Skin Detection," ICPR (4) 2004: 549-552.   DOI
15 Gaber, M. M., Zaslavsky, A., and Krishnaswamy, S. Mining data streams: a review. SIGMOD Rec. 34, 2, 18-26, 2005.   DOI   ScienceOn