Browse > Article

A Real-Time Data Mining for Stream Data Sets  

Kim Jinhwa (서강대학교 경영학과)
Min Jin Young (숙명대학교 경영학과)
Publication Information
Abstract
A stream data is a data set that is accumulated to the data storage from a data source over time continuously. The size of this data set, in many cases. becomes increasingly large over time. To mine information from this massive data. it takes much resource such as storage, memory and time. These unique characteristics of the stream data make it difficult and expensive to use this large size data accumulated over time. Otherwise. if we use only recent or part of a whole data to mine information or pattern. there can be loss of information. which may be useful. To avoid this problem. we suggest a method that efficiently accumulates information. in the form of rule sets. over time. It takes much smaller storage compared to traditional mining methods. These accumulated rule sets are used as prediction models in the future. Based on theories of ensemble approaches. combination of many prediction models. in the form of systematically merged rule sets in this study. is better than one prediction model in performance. This study uses a customer data set that predicts buying power of customers based on their information. This study tests the performance of the suggested method with the data set alone with general prediction methods and compares performances of them.
Keywords
Stream Data Sets; Real-Time Data Mining; Merging Rules; Weights on Rules; Predictions;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Ayan, N.F., A.U. Tansel, and M.E. Arkun, 'An efficient algorithm to update large item sets with early pruning,' Proceedings of the Firth CM SIGKDD, International Conference on Knowledge Discovery and Data Mining, (1999), pp.287-291
2 Gehrke, J.V., Ganti, R. Ramakrishnan, and W.-L. Loh, 'BOAT : optimistic decision tree construction,' Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, (1999), pp.169-180
3 Wang, H., W. Fan, P.S. Yu, and J. Han, 'Mining Concept-Drifiting Data Streams Using Ensemble Classifiers,' Proceedings of ACM-SIGKDD'03, (August 2003), pp. 24-27
4 Agrawal, R. and R. Srikant, 'Mining sequential patterns, Proceedings of 1995 International Conference of Data Engineering, (1995), pp.3-14
5 Ganti, V., J. Gehrke, and R. Ramakrishnan, 'DEMON : Mining and monitoring evolving data,' Proceedings of the Sixteenth International Conference on Data Engineering, (2000), pp.439-448
6 Greenwald, M. and S. Khanna, 'Space-Efficient On-line Computation of Quantile Summaries,' Proceedings of ACM SIG MOD, 2001
7 Guha, S., N. Mishra, R. Motwani, and L. O'Callagan, 'Clustering Data Streams,' Proceedings of the 41st Annual Symposium on Foundations of Computer Science, 2000
8 Hidber, C., 'Online Associattion Rule Mining,' Proceedings of ACM SIGMOD, (1999), pp.145-156
9 Cheung, W. and O.R. Zaiane, 'Incremental Minning of Frequent Patterns Without Candidate Generation or Support Constraint,' Proceedings of the 7th International Database Engineering and Applications Symposium(IDEAS'03), 2003
10 Han, J. and M, Kamber, Data 'Mining Concepts and Techniques,' Morgan Kaufmann, 2001
11 Ganti, V., J. Gehrke, and R. Ramakrishnan, 'Mining Data Streams under Block Evolution,' SIGKDD Explorations, Vol.3, No.2 (2002), pp.1-10
12 Zaki, M.J. and C.J. Hsiao, 'CHARM : An efficient algorithm for closed itern set mining,' Proceedings of 2002 SIAM, 2002
13 Cheung, D.W., J. Hand, V. Ng, and C.Y. Wong, 'Maintenance of discovered association rules in large databases : An incremental updating technique,' Proceedings of the Twelfth International Conference on Data Engineering, (1996), pp.106-114
14 Pei, J., J. Han, and R. Mao, 'CLOSET : An efficient algorithm for mining frequent closed item sets,' Proceedings of 2000, ACM-SIGMOD International workshop of Data Mining and Knowledge Discovery, (2000), pp.11-20
15 Han, J., J. Pei, and Y. Yin, 'Mining frequent patterns without candidate generation,' Proceedings of 2000 ACM-SIGMOD International Conference of Management of Data(2000), pp.1-12
16 Agrawal, R. and R. Srikant, 'Fast algorithms for mining association rules,' VLDB'94, Sept., 1994
17 Giannella, C., J. Han, J. Pei, and X. Yan, 'Mining Frequnet Patterns in Data Streams at Multiple Time Granularities,' Chapter3, Kargupta H., A. Joshi, K. Sivakumar and Y. Yesha(eds.), Next Generation Data Mining, MIT Press, 2003
18 Hulten, G., L. Spencer, and P. Domingos, Mining Time-Changing Data Streams .KDD, 01, 2001
19 Dorningos, P. and G. Hulten, 'Mining High-Speed Data Streams,' Proceedings of the Sixth ACM SIGKDD International Conference on Knowlege Discovery and Data Mining, (2000), pp.71-80