Browse > Article
http://dx.doi.org/10.14400/JDC.2022.20.2.217

Dynamic Subspace Clustering for Online Data Streams  

Park, Nam Hun (Dept. of Convergence Software, Anyang University)
Publication Information
Journal of Digital Convergence / v.20, no.2, 2022 , pp. 217-223 More about this Journal
Abstract
Subspace clustering for online data streams requires a large amount of memory resources as all subsets of data dimensions must be examined. In order to track the continuous change of clusters for a data stream in a finite memory space, in this paper, we propose a grid-based subspace clustering algorithm that effectively uses memory resources. Given an n-dimensional data stream, the distribution information of data items in data space is monitored by a grid-cell list. When the frequency of data items in the grid-cell list of the first level is high and it becomes a unit grid-cell, the grid-cell list of the next level is created as a child node in order to find clusters of all possible subspaces from the grid-cell. In this way, a maximum n-level grid-cell subspace tree is constructed, and a k-dimensional subspace cluster can be found at the kth level of the subspace grid-cell tree. Through experiments, it was confirmed that the proposed method uses computing resources more efficiently by expanding only the dense space while maintaining the same accuracy as the existing method.
Keywords
Data Streams; Clustering; Data mining; Subspace clustering; Online data mining;
Citations & Related Records
연도 인용수 순위
  • Reference
1 M. Garofalakis, J. Gehrke & R. Rastogi. (2002) Querying and mining data streams: you only get one look. In the tutorial notes of the 28th Int'l Conference on Very Large Databases, Hong Kong. DOI:10.1145/564691.564794   DOI
2 Mohamed Medhat Gaber, Arkady B. Zaslavsky & Shonali Krishnaswamy. (2005). Mining data streams: a review. SIGMOD Record 34(2), 18-26. DOI: 10.1145/1083784.1083789   DOI
3 Ming Hua, Jian Pei & Xuemin Lin. (2011). Ranking queries on uncertain data. The International Journal on Very Large Data Bases, 20(1), 129-153. DOI: 10.1007/s00778-010-0196-4   DOI
4 Jie Zhao, Xiaowen Li & Peiquan Jin. (2012). A Time-Enhanced Topic Clustering Approach for News Web Search. Int. Journal of Database Theory and Application, 5(4), 1-10.
5 Tang MingJing, Li Tong, Zhu Rui & Ma ZiFei. (2021). A Cluster Analysis Method of Software Development Activities Based on Event Log. Recent Advances in Computer Science and Communications, 14(6). 1843-1851. DOI: 10.2174/2666255813666191204144931   DOI
6 Hans-Peter Kriegel, Peer Kroger, Matthias Renz & Sebastian Wurst. (2006). Generic Framework for Efficient Subspace Clustering of High-Dimensional Data. In Proceedings of the Fifth IEEE International Conference on Data Mining, 250-257. DOI: 10.1109/ICDM.2005.5   DOI
7 O'callaghan, L., Mishra, N., Meyerson, A., Guha, S., & Motwani, R. (2002). Streaming-data algorithms for high-quality clustering. In Proceedings 18th International Conference on Data Engineering, 685-694. DOI: 10.5555/876875.878995   DOI
8 Charu C. Aggarwal, Jiawei Han, Jianyong Wang & Philip S. Yu. (2003). A Framework for Clustering Evolving Data Streams. In Proc. VLDB 29th.. DOI: 10.1016/B978-012722442-8/5 0016-1   DOI
9 Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos & Prabhakar Raghavan. (1998). Automatic subspace clustering of high dimensional data for data mining applications. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, 94-105. DOI: 10.1145/276305.276314   DOI
10 Joong Hyuk Chang & Won Suk Lee. (2006). Finding frequent itemsets over online data streams. Information & Software Technology, 48(7), 606-618. DOI: 10.1016/j.infsof.2005.06.004   DOI
11 Mohammed Oualid Attaoui, Hanene Azzag, Mustapha Lebbah & Nabil Keskes. (2020). Subspace data stream clustering with global and local weighting models, Neural Computing and Applications, 33, 3691-3712. DOI: 10.1007/s00521-020-05184-z   DOI
12 Chun-Hung Cheng, Ada Waichee Fu & Yi Zhang. (1999). Entropy-based subspace clustering for mining numerical data. In Proceedings of the fifth ACM SIGKDD International Conference on Knowledge discovery and data mining, 84-93. DOI: 10.1145/312129.312199   DOI
13 Eoin Martino Grua, Mark Hoogendoorn, Ivano Malavolta, Patricia Lago & A.E. Eiben. (2019). CluStreamGT Online Clustering for Personali -zation in the Health Domain. IEEE/WIC/ ACM International Conference on Web Intelligence.
14 Nam Hun Park & Won Suk Lee. (2007). Cell trees: An Adaptive Synopsis structure for clustering multi-dimensional on-line data streams. J. Data & Knowledge Engineering, 63(2), 528-549. DOI: 10.1016/j.datak.2007.04.003   DOI
15 Mohamed Medhat Gaber. (2011). Advances in data stream mining. Data Mining and Knowledge Discovery, 2(1). DOI: 10.1002/widm.52   DOI