Browse > Article
http://dx.doi.org/10.3745/KTSDE.2016.5.12.623

PPFP(Push and Pop Frequent Pattern Mining): A Novel Frequent Pattern Mining Method for Bigdata Frequent Pattern Mining  

Lee, Jung-Hun (동국대학교 전산원)
Min, Youn-A (가천대학교 SW중심대학사업단)
Publication Information
KIPS Transactions on Software and Data Engineering / v.5, no.12, 2016 , pp. 623-634 More about this Journal
Abstract
Most of existing frequent pattern mining methods address time efficiency and greatly rely on the primary memory. However, in the era of big data, the size of real-world databases to mined is exponentially increasing, and hence the primary memory is not sufficient enough to mine for frequent patterns from large real-world data sets. To solve this problem, there are some researches for frequent pattern mining method based on disk, but the processing time compared to the memory based methods took very time consuming. There are some researches to improve scalability of frequent pattern mining, but their processes are very time consuming compare to the memory based methods. In this paper, we present PPFP as a novel disk-based approach for mining frequent itemset from big data; and hence we reduced the main memory size bottleneck. PPFP algorithm is based on FP-growth method which is one of the most popular and efficient frequent pattern mining approaches. The mining with PPFP consists of two setps. (1) Constructing an IFP-tree: After construct FP-tree, we assign index number for each node in FP-tree with novel index numbering method, and then insert the indexed FP-tree (IFP-tree) into disk as IFP-table. (2) Mining frequent patterns with PPFP: Mine frequent patterns by expending patterns using stack based PUSH-POP method (PPFP method). Through this new approach, by using a very small amount of memory for recursive and time consuming operation in mining process, we improved the scalability and time efficiency of the frequent pattern mining. And the reported test results demonstrate them.
Keywords
Big Data Mining; Frequent Pattern Mining; FP-Growth; IFP-Tree; PPFP;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 R. Agrawal, T. Imieliski, and A. Swami, "Mining association rules between sets of items in large databases," in Proc. ACM SIGMOD Int. Conf. Manage. Data, pp.207-216, 1993.
2 R. Agrawal and R. Srikant, "Fast algorithms for mining association rules in large databases," in Proc. Int. Conf. Very Large Data Bases, pp.487-499, 1994.
3 A. Amir, R. Feldman, and R. Kashi, "A new and versatile method for association generation," Inf. Syst., Vol.22, No.6/7, pp.333-347, Sep.-Nov., 1997.   DOI
4 J. Han, J. Pei, and Y. Yin, "Mining frequent patterns without candidate generation," in Proc. ACM SIGMOD Int. Conf. Manage. Data, pp.1-12, 2000.
5 M. J. Zaki, "Scalable algorithms for association mining," IEEE Trans. Knowl. Data Eng., Vol.12, No.3, pp.372-390, May, 2000.   DOI
6 M. El-Hajj and O. R. Zaiane, "COFI approach for mining frequent item-sets revisited," in Proc. ACM SIGMOD Workshop Res. Issues Data Mining Knowl. Discovery, New York, pp.70-75, 2004.
7 W. Cheung and O. R. Zaiane, "Incremental mining of frequent patterns without candidate generation or support constraint," in Proc. IEEE Int. Conf. Database Eng. Appl., Los Alamitos, CA, pp.111-116, 2003.
8 C. K.-S. Leung, Q. I. Khan, and T. Hoque, "Cantree: A tree structure for efficient incremental mining of frequent patterns," in Proc. IEEE Int. Conf. Data Mining, Los Alamitos, CA, pp.274-308, 2005.
9 C. K. -S. Leung and Q. I. Khan, "DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams," in Proc. IEEE ICDM, pp.928-932, 2006.
10 J. H. Lee, "IRFP-tree: Intersection Rule Based FP-tree," KIPS Transaction on Software and Data Engineering, Vol. 5, Issue 3, pp.155-164, 2016.   DOI
11 J.-L. Koh and S.-F. Shieh, "An efficient approach for maintaining association rules based on adjusting FP-tree structures," in Proc. DASFAA, Springer-Verlag, Berlin Heidelberg New York, pp.417-424, 2004.
12 G. Liu, H. Lu, J. X. Yu, W. Wang, and X. Xiao, "AFOPT: An efficient implementation of pattern growth approach," in Proc. FIMI, 2003.
13 B. Goethals, "Memory issues in frequent itemset mining," in Proc. ACM SAC, pp.530-534, 2004.
14 M. Adan and R. Alhajj, "DRFP-tree: Disc-resident frequent pattern tree," Appl. Intell., Vol.30, No.2, pp.207-216, 2009.
15 M. Adan and R. Alhajj, "A Bounded and Adaptive Memory-Based Approach to Mine Frequent Patterns From Very Large Databases," IEEE Transactions on Systems, Man, and Cybernetics, Part B, Vol.41, Issue 1, pp.154-172, 2011.   DOI
16 X. Shang, K.-U. Sattler, and I. Geist, "SQL Based Frequent Pattern Mining with FP-growth," in INAP/WLP, pp.32-46, 2005.
17 G. Buehrer, S. Parthasarathy, and A. Ghoting, "Out-of-core frequent pattern mining on a commodity PC," in Proc. 12th ACM SIGKDD Int. Conf. KDD, pp.86-95, 2006.
18 R. Vaarandi, "A breadth-first algorithm for mining frequent patterns from event logs," in Proc. IEEE INTELLCOMM, pp.293-308, 2004.