Browse > Article
http://dx.doi.org/10.7472/jksii.2013.14.6.01

Performance Analysis of Frequent Pattern Mining with Multiple Minimum Supports  

Ryang, Heungmo (Dept. of Computer Engineering, Sejong University)
Yun, Unil (Dept. of Computer Engineering, Sejong University)
Publication Information
Journal of Internet Computing and Services / v.14, no.6, 2013 , pp. 1-8 More about this Journal
Abstract
Data mining techniques are used to find important and meaningful information from huge databases, and pattern mining is one of the significant data mining techniques. Pattern mining is a method of discovering useful patterns from the huge databases. Frequent pattern mining which is one of the pattern mining extracts patterns having higher frequencies than a minimum support threshold from databases, and the patterns are called frequent patterns. Traditional frequent pattern mining is based on a single minimum support threshold for the whole database to perform mining frequent patterns. This single support model implicitly supposes that all of the items in the database have the same nature. In real world applications, however, each item in databases can have relative characteristics, and thus an appropriate pattern mining technique which reflects the characteristics is required. In the framework of frequent pattern mining, where the natures of items are not considered, it needs to set the single minimum support threshold to a too low value for mining patterns containing rare items. It leads to too many patterns including meaningless items though. In contrast, we cannot mine any pattern if a too high threshold is used. This dilemma is called the rare item problem. To solve this problem, the initial researches proposed approximate approaches which split data into several groups according to item frequencies or group related rare items. However, these methods cannot find all of the frequent patterns including rare frequent patterns due to being based on approximate techniques. Hence, pattern mining model with multiple minimum supports is proposed in order to solve the rare item problem. In the model, each item has a corresponding minimum support threshold, called MIS (Minimum Item Support), and it is calculated based on item frequencies in databases. The multiple minimum supports model finds all of the rare frequent patterns without generating meaningless patterns and losing significant patterns by applying the MIS. Meanwhile, candidate patterns are extracted during a process of mining frequent patterns, and the only single minimum support is compared with frequencies of the candidate patterns in the single minimum support model. Therefore, the characteristics of items consist of the candidate patterns are not reflected. In addition, the rare item problem occurs in the model. In order to address this issue in the multiple minimum supports model, the minimum MIS value among all of the values of items in a candidate pattern is used as a minimum support threshold with respect to the candidate pattern for considering its characteristics. For efficiently mining frequent patterns including rare frequent patterns by adopting the above concept, tree based algorithms of the multiple minimum supports model sort items in a tree according to MIS descending order in contrast to those of the single minimum support model, where the items are ordered in frequency descending order. In this paper, we study the characteristics of the frequent pattern mining based on multiple minimum supports and conduct performance evaluation with a general frequent pattern mining algorithm in terms of runtime, memory usage, and scalability. Experimental results show that the multiple minimum supports based algorithm outperforms the single minimum support based one and demands more memory usage for MIS information. Moreover, the compared algorithms have a good scalability in the results.
Keywords
Multiple minimum supports; Frequent pattern mining; Rare frequent patterns; Performance evaluation; Scalability;
Citations & Related Records
Times Cited By KSCI : 3  (Citation Analysis)
연도 인용수 순위
1 R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules," in Proc. of the 20th Int'l Conf. on Very Large Data Bases (VLDB), pp. 487-499, 1994.
2 M. Berlingerio, F. Pinelli, and Francesco Calabrese, "ABACUS: frequent pAttern mining-BAsed Community discovery in mUltidimensional networkS," Data Mining and Knowledge Discovery, Vol. 27, No. 3, pp. 294-320, 2013.   DOI   ScienceOn
3 J. Han and Y. Fu, "Discovery of Multiple-level Association Rules from Large Databases," in Proc. of the 21th Int'l Conf. on Very Large Database (VLDB), pp. 420-431, 1995.
4 A.Y.R. Gonzalez, J.F.M. Trinidad, J.A. Carrasco-Ochoa, and J. Ruiz-Shulcloper, "Mining frequent patterns and association rules using similarities," Expert Systems with Applications, Vol. 40, No. 17, pp. 6823-6836, 2013.   DOI   ScienceOn
5 J. Han, J. Pei, and Y. Yin, "Mining Frequent Patterns without Candidate Generation," in Proc. of the 2000 ACM SIGMOD Int'l Conf. on Management of Data, pp. 1-12, 2000.
6 Y.-H. Hu, F. Wu, and Y.-J. Liao, "An efficient tree-based algorithm for mining sequential patterns with multiple minimum supports," Journal of Systems and Software, Vol. 86, No. 5, pp. 1224-1238, 2013.   DOI   ScienceOn
7 Y.-H. Hu and Y.-L. Chen, "Mining Association Rules with Multiple Minimum Supports: a New Mining Algorithm and a Support Tuning Mechanism," Decision Support Systems, Vol. 42, No. 1, pp. 1-24, 2006.   DOI   ScienceOn
8 T.C.-K. Huang, "Discovery of fuzzy quantitative sequential patterns with multiple minimum supports and adjustable membership functions," Information Sciences, Vol. 222, pp. 126-146, 2013.   DOI   ScienceOn
9 R.U. Kiran and P.K. Reddy, "Novel Techniques to Reduce Search Space in Multiple Minimum Supports-based Frequent Pattern Mining Algorithms," The 14th Int'l Conf. on Extending Database Technology (EDBT), pp. 11-20, 2011.
10 R. Kohavi, C.E. Brodley, B. Frasca, L. Mason, and Zijian Zheng, "KDD-Cup 2000 Organizers' Report: Peeling the Onion," SIGKDD Explorations (SIGKDD), Vol. 2, No. 2, pp. 86-98, 2000.
11 Y.-H. Liu, "Mining frequent patterns from univariate uncertain data," Data and Knowledge Engineering, Vol. 71, No. 1, pp. 47-68, 2012.   DOI   ScienceOn
12 W. Lee, S.J. Stolfo, and K.W. Mok, "Mining Audit Data to Build Intrusion Detection Models," in Proc. the 4th Int'l Conf. on Knowledge Discovery and Data Mining (KDD), pp. 66-72, 1998.
13 Y. Lee and S. Park, "Optimal Moving Pattern Mining using Frequency of Sequence and Weights," Journal of Korean Society for Internet Information, Vol. 10, No. 5, pp. 79-94, 2009.   과학기술학회마을
14 B. Liu, W. Hsu, and Y. Ma, "Mining Association Rules with Multiple Minimum Supports," in Proc. of the Fifth ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining (KDD), pp. 337-341, 1999.
15 H. Mannila, "Database Methods for Data Mining," in ACM SIGKDD Conf. on Knowledge Discovery and Data Mining (KDD) tutorial, 1998.
16 G. Pyun and U. Yun, "Performance evaluation of approximate frequent pattern mining based on probabilistic technique," Journal of Korean Society for Internet Information, Vol. 14, No. 1, pp. 63-69, 2013.   과학기술학회마을   DOI   ScienceOn
17 M. Shin and W. Paik, "Design and Implementation of Sequential Pattern Miner to Analyze Alert Data Pattern," Journal of Korean Society for Internet Information, Vol. 10, No. 2, pp. 1-13, 2009.   과학기술학회마을