[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.7472/jksii.2014.15.3.101

Analysis and Evaluation of Frequent Pattern Mining Technique based on Landmark Window

Pyun, Gwangbum (Dept. of Computer Engineering, Sejong University)
Yun, Unil (Dept. of Computer Engineering, Sejong University)

Publication Information

Journal of Internet Computing and Services / v.15, no.3, 2014 , pp. 101-107 More about this Journal

Abstract

With the development of online service, recent forms of databases have been changed from static database structures to dynamic stream database structures. Previous data mining techniques have been used as tools of decision making such as establishment of marketing strategies and DNA analyses. However, the capability to analyze real-time data more quickly is necessary in the recent interesting areas such as sensor network, robotics, and artificial intelligence. Landmark window-based frequent pattern mining, one of the stream mining approaches, performs mining operations with respect to parts of databases or each transaction of them, instead of all the data. In this paper, we analyze and evaluate the techniques of the well-known landmark window-based frequent pattern mining algorithms, called Lossy counting and hMiner. When Lossy counting mines frequent patterns from a set of new transactions, it performs union operations between the previous and current mining results. hMiner, which is a state-of-the-art algorithm based on the landmark window model, conducts mining operations whenever a new transaction occurs. Since hMiner extracts frequent patterns as soon as a new transaction is entered, we can obtain the latest mining results reflecting real-time information. For this reason, such algorithms are also called online mining approaches. We evaluate and compare the performance of the primitive algorithm, Lossy counting and the latest one, hMiner. As the criteria of our performance analysis, we first consider algorithms' total runtime and average processing time per transaction. In addition, to compare the efficiency of storage structures between them, their maximum memory usage is also evaluated. Lastly, we show how stably the two algorithms conduct their mining works with respect to the databases that feature gradually increasing items. With respect to the evaluation results of mining time and transaction processing, hMiner has higher speed than that of Lossy counting. Since hMiner stores candidate frequent patterns in a hash method, it can directly access candidate frequent patterns. Meanwhile, Lossy counting stores them in a lattice manner; thus, it has to search for multiple nodes in order to access the candidate frequent patterns. On the other hand, hMiner shows worse performance than that of Lossy counting in terms of maximum memory usage. hMiner should have all of the information for candidate frequent patterns to store them to hash's buckets, while Lossy counting stores them, reducing their information by using the lattice method. Since the storage of Lossy counting can share items concurrently included in multiple patterns, its memory usage is more efficient than that of hMiner. However, hMiner presents better efficiency than that of Lossy counting with respect to scalability evaluation due to the following reasons. If the number of items is increased, shared items are decreased in contrast; thereby, Lossy counting's memory efficiency is weakened. Furthermore, if the number of transactions becomes higher, its pruning effect becomes worse. From the experimental results, we can determine that the landmark window-based frequent pattern mining algorithms are suitable for real-time systems although they require a significant amount of memory. Hence, we need to improve their data structures more efficiently in order to utilize them additionally in resource-constrained environments such as WSN(Wireless sensor network).

Keywords

Landmark Window; Frequent pattern mining; Online mining; Performance evaluation; Scalability;

Citations & Related Records

Reference

1	X. Zhu, W. Ding, P. S. Yu, C. Zhang, One-class learning and concept summarization for data streams, Knowledge and Information Systems, vol.28, no.3, pp.523-553, 2011. DOI
2	Frequent itemset Mining dataset repository. (www.almaden.ibm.com/software/projects/hdb/resources.shtml)
3	E. T. Wang, A. L. Chen, Mining frequent itemsets over distributed data streams by continuously maintaining a global synopsis, Data Mining and Knowledge Discovery, vol.23, no.2, pp.252-299, 2011. DOI
4	S.K. Tanbeer, C.F. Ahmed, B.S. Jeong and Y.K. Lee, Sliding window-based frequent pattern mining over data streams, Information sciences, vol.179, no.22, pp.3843-3865, 2009. DOI ScienceOn
5	R. Jin, G. Agrawal, An Algorithm for In-Core Frequent Itemset Mining on Streaming Data, International Conference on Data Mining(ICDM), pp.210-217, 2005.
6	X. H. Dang, W. Ng, K. Ong, Online mining of frequent sets in data streams with error guarantee, Knowledge and Information Systems, vol.16, no.2, pp.245-258, 2008. DOI
7	R.C. Wong, A.W. Fu, "Mining Top-k frequent itemsets from data streams", Data Mining and Knowledge Discovery(DMKD), vol.13, no.2, pp. 193-217, 2006. DOI
8	R. Agrawal and R. Srikant, Fast algorithms for Mining Association Rules, in Proc. of the 20th int'l Conf. on Very Large Data Bases(VLDB), pp.487-499, 1994.
9	G. S. Manku, R. Motwani, Approximate Frequency Counts over Data Streams, International conference on Very Large Data Bases, pp. 346-357, 2006.
10	E. T. Wang, A. P. Chen, A novel hash-based approach for mining frequent itemsets over data streams requiring less memory space, Data Mining and Knowledge Discovery, vol.19, no.1, pp.346-357, 2006.
11	H. Huang, X. Wu, R. Relue, Mining Frequent Patterns with the Pattern Tree, New Generation Computing, vol.23, pp.315-337, 2004.
12	J. H. Chang and W. S. Lee, Finding recently frequent itemsets adaptively over online transactional data strems, Information Systems, vol.31, pp.849-869, 2006. DOI
13	H. Li, N. Zhaing, Z. Chen, A Simple but Effective Maximal Frequent Itemset Mining Algorithm over Streams", Journal of Software, vol. 7, no. 1, pp. 25-32 Jan. 2012
14	X. Liu, J. Guan and P. Hu, Mining frequent closed itemsets from a landmark window over online data streams, Computers & Mathematics with Applications, vol.57, no.6, pp.927-936, 2009. DOI
15	A. Ramanathan, P. K. Agarwal, M. kurnikova and C. J. Langmead, An Online Approach for Mining Collective Behaviors form Molecular Dynamics Simulations, International Conference on Research in Computational Molecular Biology, pp.138-154, 2009.

KSCI

Analysis and Evaluation of Frequent Pattern Mining Technique based on Landmark Window 랜드마크 윈도우 기반의 빈발 패턴 마이닝 기법의 분석 및 성능평가

Analysis and Evaluation of Frequent Pattern Mining Technique based on Landmark Window