Browse > Article
http://dx.doi.org/10.7472/jksii.2015.16.1.67

Performance analysis of Frequent Itemset Mining Technique based on Transaction Weight Constraints  

Yun, Unil (Dept. of Computer Engineering, Sejong University)
Pyun, Gwangbum (Dept. of Computer Engineering, Sejong University)
Publication Information
Journal of Internet Computing and Services / v.16, no.1, 2015 , pp. 67-74 More about this Journal
Abstract
In recent years, frequent itemset mining for considering the importance of each item has been intensively studied as one of important issues in the data mining field. According to strategies utilizing the item importance, itemset mining approaches for discovering itemsets based on the item importance are classified as follows: weighted frequent itemset mining, frequent itemset mining using transactional weights, and utility itemset mining. In this paper, we perform empirical analysis with respect to frequent itemset mining algorithms based on transactional weights. The mining algorithms compute transactional weights by utilizing the weight for each item in large databases. In addition, these algorithms discover weighted frequent itemsets on the basis of the item frequency and weight of each transaction. Consequently, we can see the importance of a certain transaction through the database analysis because the weight for the transaction has higher value if it contains many items with high values. We not only analyze the advantages and disadvantages but also compare the performance of the most famous algorithms in the frequent itemset mining field based on the transactional weights. As a representative of the frequent itemset mining using transactional weights, WIS introduces the concept and strategies of transactional weights. In addition, there are various other state-of-the-art algorithms, WIT-FWIs, WIT-FWIs-MODIFY, and WIT-FWIs-DIFF, for extracting itemsets with the weight information. To efficiently conduct processes for mining weighted frequent itemsets, three algorithms use the special Lattice-like data structure, called WIT-tree. The algorithms do not need to an additional database scanning operation after the construction of WIT-tree is finished since each node of WIT-tree has item information such as item and transaction IDs. In particular, the traditional algorithms conduct a number of database scanning operations to mine weighted itemsets, whereas the algorithms based on WIT-tree solve the overhead problem that can occur in the mining processes by reading databases only one time. Additionally, the algorithms use the technique for generating each new itemset of length N+1 on the basis of two different itemsets of length N. To discover new weighted itemsets, WIT-FWIs performs the itemset combination processes by using the information of transactions that contain all the itemsets. WIT-FWIs-MODIFY has a unique feature decreasing operations for calculating the frequency of the new itemset. WIT-FWIs-DIFF utilizes a technique using the difference of two itemsets. To compare and analyze the performance of the algorithms in various environments, we use real datasets of two types (i.e., dense and sparse) in terms of the runtime and maximum memory usage. Moreover, a scalability test is conducted to evaluate the stability for each algorithm when the size of a database is changed. As a result, WIT-FWIs and WIT-FWIs-MODIFY show the best performance in the dense dataset, and in sparse dataset, WIT-FWI-DIFF has mining efficiency better than the other algorithms. Compared to the algorithms using WIT-tree, WIS based on the Apriori technique has the worst efficiency because it requires a large number of computations more than the others on average.
Keywords
Transaction weight; Data Mining; Frequent Itemset mining; Performance evaluation; Scalability;
Citations & Related Records
Times Cited By KSCI : 5  (Citation Analysis)
연도 인용수 순위
1 R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules", Proceedings of 20th International Conference on Very Large Data Bases, pp. 487-499, 1994. http://dl.acm.org/citation.cfm?id=672836
2 L. Cagliero and P. Garza, "Infrequent Weighted Itemset Mining Using Frequent Pattern Growth", IEEE Transactions on Knowledge and Data Engineering, Vol. 26, no. 4, pp. 903-915, 2014. http://dx.doi.org/10.1109/TKDE.2013.69   DOI
3 J. Han, J. Pei, Y. Yin, and R. Mao, "Mining frequent patterns without candidate generation : a frequent pattern tree approach", Data Mining and Knowledge Discovery, Vol. 8, no. 1, pp. 53-87, 2004. http://dl.acm.org/citation.cfm?id=954525   DOI
4 Y. Kim, W. Kim, and U. Kim, "Mining Frequent Itemsets with Normalized Weight in Continuous Data Streams", The Journal of Information Processing Systems, Vol. 6, no. 1, pp. 79-90, 2010. http://65.54.113.26/Publication/13268251/mining-frequent-itemsets-with-normalized-weight-in-continuous-data-streams   DOI
5 Y. Lee and S. Park, "Optimal Moving Pattern Mining using Frequency of Sequence and Weights", Journal of Korean Society for Internet Information, Vol. 10, no. 5, pp. 79-94, 2009. http://www.koreascience.or.kr/article/ArticleFullRecord.jsp?cn=OTJBCD_2009_v10n5_79
6 C. Lin, T. Hong, G. Lan, J. Wong, W. Lin, "Incrementally mining high utility patterns based on pre-large concept", Applied Intelligence, Vol. 40, no. 2, pp. 343-357, 2014. http://dl.acm.org/citation.cfm?id=2584602   DOI
7 H. Min, J. Park, D. Lee, and I. Kim, "Outlier Detection Method for Mobile Banking with User Input Pattern and E-finance Transaction Pattern", Journal of Korean Society for Internet Information, Vol. 15, no. 1, 157-170, 2014. http://www.researchgate.net/publication/264171355_Outlier_Detection_Method_for_Mobile_Banking_with_User_Input_Pattern_and_E-finance_Transaction_Pattern
8 G.D. Ramkumar, S. Ranka, and S. Tsur, "Weighted Association Rules: Model and Algorithm", Proceedings of 4th ACM International Conference on Knowledge Discovery and Data Mining, pp. 661-666, 1998. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.22.9320
9 H. Ryang and U. Yun, "Performance Analysis of Frequent Pattern Mining with Multiple Minimum Supports", Journal of Korean Society for Internet Information, Vol. 14, no. 6, 1-8, 2013. http://dx.doi.org/10.7472/jksii.2013.14.6.01   DOI
10 M. Shin and W. Paik, "Design and Implementation of Sequential Pattern Miner to Analyze Alert Data Pattern", Journal of Korean Society for Internet Information, Vol. 10, no. 2, pp. 1-13, 2009. http://ocean.kisti.re.kr/IS_mvpopo001P.do?method=multMain&poid=ksii1&free=
11 B. Vo, F. Coenen, and B. Le, "A new method for mining Frequent Weighted Itemsets based on WIT-trees", Expert system with applications, Vol. 40, pp. 1256-1264, 2013. http://dl.acm.org/citation.cfm?id=2400944   DOI
12 U. Yun, "On pushing weight constraints deeply into frequent itemset mining", Intelligent Data Analysis, Vol. 13, no. 2, pp. 359-383, 2009. http://iospress.metapress.com/content/b1720248602407ut/
13 S. Zhang, P. Guo, Jifu Z., X. Wang, and W. Pedrycz, "A completeness analysis of frequent weighted concept lattices and their algebraic properties", Data and Knowledge Engineering, Vols. 81-82, pp. 104-117, 2012. http://www.sciencedirect.com/science/article/pii/S0169023X12000833   DOI