[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3745/KIPSTD.2003.10D.3.447

Finding Frequent Itemsets based on Open Data Mining in Data Streams

Chang, Joong-Hyuk (연세대학교 대학원 컴퓨터과학과)
Lee, Won-Suk (연세대학교 컴퓨터과학과)

Publication Information

The KIPS Transactions:PartD / v.10D, no.3, 2003 , pp. 447-458 More about this Journal

Abstract

The basic assumption of conventional data mining methodology is that the data set of a knowledge discovery process should be fixed and available before the process can proceed. Consequently, this assumption is valid only when the static knowledge embedded in a specific data set is the target of data mining. In addition, a conventional data mining method requires considerable computing time to produce the result of mining from a large data set. Due to these reasons, it is almost impossible to apply the mining method to a realtime analysis task in a data stream where a new transaction is continuously generated and the up-to-dated result of data mining including the newly generated transaction is needed as quickly as possible. In this paper, a new mining concept, open data mining in a data stream, is proposed for this purpose. In open data mining, whenever each transaction is newly generated, the updated mining result of whole transactions including the newly generated transactions is obtained instantly. In order to implement this mechanism efficiently, it is necessary to incorporate the delayed-insertion of newly identified information in recent transactions as well as the pruning of insignificant information in the mining result of past transactions. The proposed algorithm is analyzed through a series of experiments in order to identify the various characteristics of the proposed algorithm.

Keywords

Open data mining; Data stream; Frequent itemset; Delayed-insertion; Itemset pruning; Realtime analysis;

Citations & Related Records

Times Cited By KSCI : 2 (Citation Analysis)

Reference
Cited By KSCI

1	S. J. Stolfo, A. L. Prodromidis, S. Tselepis, W. Lee, D. Fan and P. K. Chan, JAM: Java agents for meta-learning over distributed databases, In Proc. of the KDD and AAAI Workshop on AI Methods on Fraud and Risk Management, 1997
2	R. Agrawal and R. Srikant, Fast algorithms for mining association rules, In Proc. of the 20th Int'l Conference on Very Large Databases, Santiago, Chile, Sep., 1995
3	S. Brin, R. Motwani, J. D. Ullman and S. Tsur, Dynamic itemset counting and implication rules for market basket data, In Proc. of the ACM SIGMOD Int'l Conference on Management of Data, Tucson, AZ, pp.255-264, May, 1997 DOI
4	M. Charikar, K. Chen and M. Farach-Colton, Finding Frequent Items In Data Streams, In Proc. of the 29th Int'l Colloq. on Automata, Language and Programming, 2002
5	A. Savasers, E. Omiecinski and S. Navathe, An efficient algorithm for mining association rules in large databases, In Proc. of the 21st Int'l Conference on Very Large Database, Zurich, Switzerland, pp.432-444, Sept., 1995
6	S. Guha, R. Rastogi and K. Shim, CURE: A clustering algorithm for large databases, In Proc. of the ACM SIGMOD Int'l Conference on Management of Data, Seattle, WA, pp.73-84, June, 1998 DOI
7	A. Berson and S. J. Smith, Data Warehousing, Data Mining, and OLAP: On-Line Analytical Processing, McGraw-Hill, New York, pp.247-266, 1997
8	G. S. Manku and R. Motwani, Approximate frequency counts over data streams, In Proc. of the 28th Int'l Conference on Very Large Databases, Hong Kong, China, Aug., 1994
9	S. Gallant, G. Piatetsky-Shapiro and M. Tan, Value-based data mining for CRM. In tutorial notes of the 7th ACM SIGKDD Int'l Conference on Knowledge Discovery and Data Mining, SanFrancisco, CA, Aug., 2001 DOI
10	C. Hidber, Online association rule mining, In Proc. of the ACM SIGMOD Int'l Conference on Management of Data, Philadelphia, PA, pp.145-156, May, 1999 DOI
11	R. C. Agarwal, C. C. Aggarwal and V. V. V. Prasad, Depth first generation of long patterns, In Proc. of the 6th ACM SIGKDD Int'l Conference on Knowledge Discovery and Data Mining, Boston, MA, pp.108-118, Sep., 2000 DOI
12	Y. Aumann, R. Feldman, O. Lipshtat and H. Manilla, Borders: An efficient algorithm for association generation in dynamic databases, Journal of Intelligent Information System, Vol.12, No.1, pp.61-73, 1999 DOI ScienceOn
13	V. Ganti, J. Gehrke and R. Ramakrishnan, DEMON: Mining and monitoring evolving data, In Proc. of the 16th Int'l Conference on Data Engineering, San Diego, California, pp.439-448, Feb., 2000 DOI
14	S. Cuha, R. Rastogi and K. Shim, ROCK: A robust clustering algorithm for categorical attributes, In Proc. of the 15th Int'l Conference on Data Engineering, Sydney, Australia, pp.512-521, May, 1999 DOI

1	Mining Association Rules in Multidimensional Stream Data / [Kim, Dae-In;Park, Joon;Kim, Hong-Ki;Hwang, Bu-Hyun;] / The KIPS Transactions:PartD
2	A Sliding Window Technique for Open Data Mining over Data Streams / [Chang Joong-Hyuk;Lee Won-Suk;] / The KIPS Transactions:PartD
3	Frequent Patten Tree based XML Stream Mining / [Hwang, Jeong-Hee;] / The KIPS Transactions:PartD

KSCI

Finding Frequent Itemsets based on Open Data Mining in Data Streams 데이터 스트림에서 개방 데이터 마이닝 기반의 빈발항목 탐색

Finding Frequent Itemsets based on Open Data Mining in Data Streams