[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3745/KIPSTD.2008.15-D.6.741

Finding Frequent Itemsets Over Data Streams in Confined Memory Space

Kim, Min-Jung (삼성전자 무선사업부 GSM 단말 MMI 개발)
Shin, Se-Jung (연세대학교 컴퓨터과학과)
Lee, Won-Suk (연세대학교 컴퓨터과학과)

Publication Information

The KIPS Transactions:PartD / v.15D, no.6, 2008 , pp. 741-754 More about this Journal

Abstract

Due to the characteristics of a data stream, it is very important to confine the memory usage of a data mining process regardless of the amount of information generated in the data stream. For this purpose, this paper proposes the Prime pattern tree(PPT) for finding frequent itemsets over data streams with using the confined memory space. Unlike a prefix tree, a node of a PPT can maintain the information necessary to estimate the current supports of several itemsets together. The length of items in a prime pattern can be reduced the total number of nodes and controlled by split_delta $S_{\delta}$ . The size and the accuracy of the PPT is determined by $S_{\delta}$ . The accuracy is better as the value of $S_{\delta}$ is smaller since the value of $S_{\delta}$ is large, many itemsets are estimated their frequencies. So it is important to consider trade-off between the size of a PPT and the accuracy of the mining result. Based on this characteristic, the size and the accuracy of the PPT can be flexibly controlled by merging or splitting nodes in a mining process. For finding all frequent itemsets over the data stream, this paper proposes a PPT to replace the role of a prefix tree in the estDec method which was proposed as a previous work. It is efficient to optimize the memory usage for finding frequent itemsets over a data stream in confined memory space. Finally, the performance of the proposed method is analyzed by a series of experiments to identify its various characteristics.

Keywords

Data Mining; Data Stream; Frequent Itemsets;

Citations & Related Records

Reference

1	S. Guha and N. Koudas, “Approximating a Data Stream for Querying and Estimation: Algorithms and Performance Evaluation,” In Proc. of the 18th Int'l Conf. on Data Engineering, pp.567-576, 2002 DOI
2	G. Dong, J. Han, L.V.S. Lakshmanan, J. Pei, H. Wang, and P.S. Yu. Online Mining of Changes from Data Streams: Research Problems and Preliminary Results. Proc. of the Workshop on Management and Processing of Data Streams, 2003
3	Wei-Guang Teng, Ming-Syan Chen, Philip S. Yu. A Regression-Based Temporal Pattern Mining Scheme for Data Streams, Proc. of the 29th Int'l Conf on Very Large Database, Berlin, Germany, 2003
4	R.C. Agarwal, C.C. Aggarwal, and V.V.V. Prasad, “Depth First Generation of Long Patterns,” In Proc. of the 6th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining, pp.108-118, 2000 DOI
5	C.C. Aggarwal and P.S. Yu, “Online Generation of Association Rules,” Proc. of the 14th Int'l IEEE Conf. on Data Engineering, pp.402-411, 1998
6	R. Agrawal, and R. Srikant. Fast algorithms for mining association rules. Proc. of the 20th Int'l Conf. on Very Large Databases, Santiago, Chile, Sept., 1994
7	Zhihong Chong, Jeffrey Xu Yu, Hongjun Lu, Zhengjie Zhang, and Aoying Zhou. False-Negative Frequent Items Mining from Data Streams with Bursting. Proc. of the 10th Int'l Conf on Database Systems for Advanced Applications, pp.422-434, 2005 DOI
8	L. Qiao, D. Agrawal, and A.E. Abbadi, “RHist: Adaptive Summarization over Continuous Data Streams,” Proc. of the 10th Int'l Conf. on Information and Knowledge Management, pp.469-476, 2002
9	A. Hafez, J. Deogun, and V. V. Raghavan. “The Item-Set Tree: A data Structure for Data Mining.” Proc. of the 1st int'l Conf on data warehousing and knowledge discovery, pp. 183-192, Aug., 1999
10	N. Jiang, and L. Gruenwald, “CFI-Stream: Mining Closed Frequent Itemsets in Data Streams,” Proc. of the 12th ACM SIGKDD int'l Conf. on Knowledge Discovery and Data Mining, pp.592-597, 2006 DOI
11	M. Garofalakis, J. Gehrke and R. Rastogi. “Querying and mining data streams: you only get one look”. In the tutorial notes of the 28th Int'l Conf. on Very Large Databases, 2002
12	J. H. Chang, W. S. Lee. “Finding recent frequent itemsets adaptively over online data streams.” In Proc. of the 9th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining, Washington, DC, 24-27, August, 2003 DOI
13	M.J. Zaki, “Generating Non-Redundant Association Rules,” In Proc. of the 6th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining, pp.34-43, 2000 DOI
14	M. Datar, A. Gionis, P. Indyk, and R. Motwani, “Maintaining Stream Statistics over Sliding Windows,” Proc. of the 13th Ann. ACM-SIAM Symp. Discrete Algorithms, pp.635-644, 2002
15	S. Brin, R. Motwani, J.D. Ullman, and S. Tsur, “Dynamic Itemset Counting and Implication Rules for Market Basket Data,” In Proc. of ACM SIGMOD Int'l Conf. Management of Data, pp.255-264, 1997 DOI
16	Yun Chi, Haixun Wang, Philip S. Yu, Richard R. Muntz “Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding Window.” In Proc. of the 4th IEEE int'l Conf. on Data Mining, pp.59-66, 2004 DOI
17	M. Charikar, K. Chen, and M. Farach-Colton, “Finding Frequent Items in Data Streams,” Proc. of the 29th Int'l. Colloq. Automata, Language and Programming, 2002
18	G.S. Manku and R. Motwani, “Approximate Frequency Counts over Data Streams,” Proc. of the 28th Int'l Conf. on Very Large Data Bases, 2002

KSCI

Finding Frequent Itemsets Over Data Streams in Confined Memory Space 한정된 메모리 공간에서 데이터 스트림의 빈발항목 최적화 방법

Finding Frequent Itemsets Over Data Streams in Confined Memory Space