Browse > Article
http://dx.doi.org/10.4218/etrij.10.1510.0066

A Novel Approach for Mining High-Utility Sequential Patterns in Sequence Databases  

Ahmed, Chowdhury Farhan (Database Lab, Department of Computer Engineering, Kyung Hee University)
Tanbeer, Syed Khairuzzaman (Database Lab, Department of Computer Engineering, Kyung Hee University)
Jeong, Byeong-Soo (Database Lab, Department of Computer Engineering, Kyung Hee University)
Publication Information
ETRI Journal / v.32, no.5, 2010 , pp. 676-686 More about this Journal
Abstract
Mining sequential patterns is an important research issue in data mining and knowledge discovery with broad applications. However, the existing sequential pattern mining approaches consider only binary frequency values of items in sequences and equal importance/significance values of distinct items. Therefore, they are not applicable to actually represent many real-world scenarios. In this paper, we propose a novel framework for mining high-utility sequential patterns for more real-life applicable information extraction from sequence databases with non-binary frequency values of items in sequences and different importance/significance values for distinct items. Moreover, for mining high-utility sequential patterns, we propose two new algorithms: UtilityLevel is a high-utility sequential pattern mining with a level-wise candidate generation approach, and UtilitySpan is a high-utility sequential pattern mining with a pattern growth approach. Extensive performance analyses show that our algorithms are very efficient and scalable for mining high-utility sequential patterns.
Keywords
Data mining; sequential patterns; high-utility patterns; knowledge discovery;
Citations & Related Records

Times Cited By Web Of Science : 0  (Related Records In Web of Science)
Times Cited By SCOPUS : 2
연도 인용수 순위
  • Reference
1 C. Kim et al., "SQUIRE: Sequential Pattern Mining with Quantities," J. Syst. Software, vol. 80, no. 10, 2007, pp. 1726- 1745.   DOI   ScienceOn
2 http://www.almaden.ibm.com/cs/projects/iis/hdb/Projects/data_ mining/datasets/syndata.html
3 Frequent Itemset Mining Dataset Repository. Available at: http://fimi.cs.helsinki.fi/data/
4 Z. Zheng, R. Kohavi, and L. Mason, "Real World Performance of Association Rule Algorithms," Proc. 7th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2001, pp. 401-406.
5 H. Yao and H.J. Hamilton, "Mining Itemset Utilities from Transaction Databases," Data Knowl. Eng., vol. 59, no. 3, 2006, pp. 603-626.   DOI   ScienceOn
6 Y. Liu, W.K. Liao, and A. Choudhary, "A Two Phase Algorithm for Fast Discovery of High Utility of Itemsets," Proc. 9th Pacific- Asia Conf. Knowl. Discovery Data Mining , 2005, pp. 689-695.
7 C.F. Ahmed et al., "An Efficient Candidate Pruning Technique for HUP Mining," Proc.13th Pacific-Asia Conf. Knowl. Discovery Data Mining, 2009, pp. 749-756.
8 Y.C. Li, J.S. Yeh, and C.C. Chang, "Isolated Items Discarding Strategy for Discovering High Utility Itemsets," Data Knowl. Eng., vol. 64, no. 1, 2008, pp. 198-217.   DOI   ScienceOn
9 C.F. Ahmed et al., "Efficient Tree Structures for HUP Mining in Incremental Databases," IEEE Trans. Knowl. Data Eng., vol. 21, no. 12, 2009, pp. 1708-1721.   DOI
10 R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules in Large Databases," Proc. 2nd Int. Conf. Very Large Data Bases, 1994, pp. 487-499.
11 J. Han et al., "Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach," Data Mining Knowl. Discovery, vol. 8, 2004, pp. 53-87.   DOI
12 J. Han et al., "Frequent Pattern Mining: Current Status and Future Directions," Data Mining Knowl. Discovery, vol. 15, no. 1, 2007, pp. 55-86.   DOI   ScienceOn
13 M.N. Garofalakis, R. Rastogi, and K. Shim, "SPIRIT: Sequential Pattern Mining with Regular Expression Constraints," Proc. 25th Int. Conf. Very Large Data Bases, 1999, pp. 223-234.
14 J. Pei, J. Han, and W. Wang, "Mining Sequential Patterns with Constraints in Large Databases," Proc. 11th Int. Conf. Inform. Knowl. Management, 2002, pp. 18-25.
15 U. Yun, "A New Framework for Detecting Weighted Sequential Patterns in Large Sequence Databases," Knowl.-Based Syst., vol. 21, no. 2, 2008, pp. 110-122.   DOI   ScienceOn
16 U. Yun, "WIS: Weighted Interesting Sequential Pattern Mining with a Similar Level of Support and/or Weight," ETRI J., vol. 29, no. 3, June 2007, pp. 336-352.   DOI   ScienceOn
17 R. Agrawal and R. Srikant, "Mining Sequential Patterns," Proc. 11th Int. Conf. Data Eng., 1995, pp. 3-14.
18 R. Srikant and R. Agrawal, "Mining Sequential Patterns: Generalizations and Performance Improvements," Proc. 5th Int. Conf. Extending Database Technol., 1996, pp. 3-17.
19 M.J. Zaki, "SPADE: An Efficient Algorithm for Mining Frequent Sequences," Mach. Learning, vol. 42, no. 1-2, Jan. 2001, pp. 31- 60.   DOI
20 J. Ayres et al., "Sequential Pattern Mining Using a Bitmap Representation," Proc. 8th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2002, pp. 429-435.
21 J. Pei et al., "Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach," IEEE Trans. Knowl. Data Eng., vol. 16, no. 11, Oct. 2004, pp. 1424-1440.   DOI   ScienceOn
22 J. Pei et al., "PrefixSpan: Mining Sequential Patterns by Prefix- Projected Growth," Proc. 17th Int. Conf. Data Eng., 2001, pp. 215-224.
23 J. Wang, J. Han, and C. Li, "Frequent Closed Sequence Mining without Candidate Maintenance," IEEE Trans. Knowl. Data Eng., vol. 19, no. 8, 2007, pp. 1042-1056.   DOI
24 H. Yao, H.J. Hamilton, and C.J. Butz, "A Foundational Approach to Mining Itemset Utilities from Databases," Proc. 3rd SIAM Int. Conf. Data Mining, 2004, pp. 482-486.