Browse > Article
http://dx.doi.org/10.3745/KIPSTD.2003.10D.3.447

Finding Frequent Itemsets based on Open Data Mining in Data Streams  

Chang, Joong-Hyuk (연세대학교 대학원 컴퓨터과학과)
Lee, Won-Suk (연세대학교 컴퓨터과학과)
Abstract
The basic assumption of conventional data mining methodology is that the data set of a knowledge discovery process should be fixed and available before the process can proceed. Consequently, this assumption is valid only when the static knowledge embedded in a specific data set is the target of data mining. In addition, a conventional data mining method requires considerable computing time to produce the result of mining from a large data set. Due to these reasons, it is almost impossible to apply the mining method to a realtime analysis task in a data stream where a new transaction is continuously generated and the up-to-dated result of data mining including the newly generated transaction is needed as quickly as possible. In this paper, a new mining concept, open data mining in a data stream, is proposed for this purpose. In open data mining, whenever each transaction is newly generated, the updated mining result of whole transactions including the newly generated transactions is obtained instantly. In order to implement this mechanism efficiently, it is necessary to incorporate the delayed-insertion of newly identified information in recent transactions as well as the pruning of insignificant information in the mining result of past transactions. The proposed algorithm is analyzed through a series of experiments in order to identify the various characteristics of the proposed algorithm.
Keywords
Open data mining; Data stream; Frequent itemset; Delayed-insertion; Itemset pruning; Realtime analysis;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 S. J. Stolfo, A. L. Prodromidis, S. Tselepis, W. Lee, D. Fan and P. K. Chan, JAM: Java agents for meta-learning over distributed databases, In Proc. of the KDD and AAAI Workshop on AI Methods on Fraud and Risk Management, 1997
2 R. Agrawal and R. Srikant, Fast algorithms for mining association rules, In Proc. of the 20th Int'l Conference on Very Large Databases, Santiago, Chile, Sep., 1995
3 S. Brin, R. Motwani, J. D. Ullman and S. Tsur, Dynamic itemset counting and implication rules for market basket data, In Proc. of the ACM SIGMOD Int'l Conference on Management of Data, Tucson, AZ, pp.255-264, May, 1997   DOI
4 M. Charikar, K. Chen and M. Farach-Colton, Finding Frequent Items In Data Streams, In Proc. of the 29th Int'l Colloq. on Automata, Language and Programming, 2002
5 A. Savasers, E. Omiecinski and S. Navathe, An efficient algorithm for mining association rules in large databases, In Proc. of the 21st Int'l Conference on Very Large Database, Zurich, Switzerland, pp.432-444, Sept., 1995
6 S. Guha, R. Rastogi and K. Shim, CURE: A clustering algorithm for large databases, In Proc. of the ACM SIGMOD Int'l Conference on Management of Data, Seattle, WA, pp.73-84, June, 1998   DOI
7 A. Berson and S. J. Smith, Data Warehousing, Data Mining, and OLAP: On-Line Analytical Processing, McGraw-Hill, New York, pp.247-266, 1997
8 G. S. Manku and R. Motwani, Approximate frequency counts over data streams, In Proc. of the 28th Int'l Conference on Very Large Databases, Hong Kong, China, Aug., 1994
9 S. Gallant, G. Piatetsky-Shapiro and M. Tan, Value-based data mining for CRM. In tutorial notes of the 7th ACM SIGKDD Int'l Conference on Knowledge Discovery and Data Mining, SanFrancisco, CA, Aug., 2001   DOI
10 C. Hidber, Online association rule mining, In Proc. of the ACM SIGMOD Int'l Conference on Management of Data, Philadelphia, PA, pp.145-156, May, 1999   DOI
11 R. C. Agarwal, C. C. Aggarwal and V. V. V. Prasad, Depth first generation of long patterns, In Proc. of the 6th ACM SIGKDD Int'l Conference on Knowledge Discovery and Data Mining, Boston, MA, pp.108-118, Sep., 2000   DOI
12 Y. Aumann, R. Feldman, O. Lipshtat and H. Manilla, Borders: An efficient algorithm for association generation in dynamic databases, Journal of Intelligent Information System, Vol.12, No.1, pp.61-73, 1999   DOI   ScienceOn
13 V. Ganti, J. Gehrke and R. Ramakrishnan, DEMON: Mining and monitoring evolving data, In Proc. of the 16th Int'l Conference on Data Engineering, San Diego, California, pp.439-448, Feb., 2000   DOI
14 S. Cuha, R. Rastogi and K. Shim, ROCK: A robust clustering algorithm for categorical attributes, In Proc. of the 15th Int'l Conference on Data Engineering, Sydney, Australia, pp.512-521, May, 1999   DOI