Browse > Article
http://dx.doi.org/10.3745/KIPSTD.2005.12D.6.807

Mining Frequent Closed Sequences using a Bitmap Representation  

Kim Hyung-Geun (강원대학교대학원 컴퓨터정보통신공학과)
Whang Whan-Kyu (강원대학교전지전자정보통신공학부)
Abstract
Sequential pattern mining finds all of the frequent sequences satisfying a minimum support threshold in a large database. However, when mining long frequent sequences, or when using very low support thresholds, the performance of currently reported algorithms often degrades dramatically. In this paper, we propose a novel sequential pattern algorithm using only closed frequent sequences which are small subset of very large frequent sequences. Our algorithm generates the candidate sequences by depth-first search strategy in order to effectively prune. using bitmap representation of underlying databases, we can effectively calculate supports in terms of bit operations and prune sequences in much less time. Performance study shows that our algorithm outperforms the previous algorithms.
Keywords
Sequential Pattern Mining; Closed Sequential Pattern;
Citations & Related Records
연도 인용수 순위
  • Reference
1 X. Yan, J. Han, and R. Afshar, 'CloSpan : Mining Closed Sequential Patterns in Large Datasets', In Proc. of 2003 SIAM Int. Conf. on Data Mining, May, 2003
2 J. Wang and J. Han, 'BIDE : Efficient Mining of Frequent Closed Sequences', In Proc. 2004 Int. Conf. Data Engineering, Mar., 2004   DOI
3 D. Burdick, M. Calimlim, and J. Gehrke, 'MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases,' In Proc. 2001 Int. Conf. Data Engineering, Heidelberg, Germany, April, 2001   DOI
4 M.J. Zaki, and C. J. Hsiao, 'CHARM: An Efficient Algorithm for Closed Itemset Mining,' In Proc. 2002 SIAM Int. Conf. Data Engineering, Arlington, VA, April, 2002
5 J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu, 'PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth,' In Proc. 2001 Int. Conf. Data Engineering, Heidelberg, Germany, April, 2001
6 J. Ayres, J.E. Gehrke, T. Yiu, and J. Flannick, 'Sequential Pattern Mining using a Bitmap Representation,' In Proc. of 2002 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases, Edmonton, Canada, July, 2002   DOI
7 J. Pei, J. Han, and R. Mao, 'CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets,' In Proc. 2000 ACM SIGMOD Int. Workshop Data Mining and Knowledge Discovery (DKKD '00) Dallas, Texas, May, 2000
8 M. Garofalakis, R Rastogi, and K. Shim, 'SPIRIT: Sequential Pattern Mining with Regular Expression Constraints.' In Proc. 1999 Int. Conf. Very Large Data Bases, Edinburgh, UK, Sept., 1999
9 R. Agrawal and R. Srikant, 'Mining Sequential Patterns,' In Proc. of the 11th Int. Conf. on Data Engineering, Taipei, Taiwan, March, 1995
10 M.J.Zaki, 'SPADE: An Efficient Algorithm for Mining Frequent Sequences', Maching Learning, 2001   DOI
11 R. Srikant and R. Agrawal, 'Mining Sequential Patterns : Generalizations and Performance Improvements', In EDBT, pp.3-17, Mar., 1996
12 H. Mannila, H. Toivonen, and A.I. Verkamo, 'Discovering Frequent Episodes in Sequences,' In Proc, 1995 Int. Conf. Knowledge Discovery and Data Mining (KDD '95), Montreal, Canada, Aug., 1995
13 J.S. Park, M.-S. Chen, and P.S. Yu, 'An Effective Hash-Based Algorithm for Mining Association Rules,' In Proc. of ACM SIGMOD Conference on Management of Data, San Jose, California, May, 1995   DOI
14 R. Agrawal and R. Srikant, 'Fast Algorithms for Mining Association Rules,' In Proc. of the 20th VLDB Conference, Santiago, Chile, Sept., 1994
15 A. Savasere, E. Omiencinsky, and S. Navathe, 'An Efficient Algorithm for Mining Association Rules in Large Databases,' In Proc. of the 21st VLDB Conference, Zurich, Swizerland, 1995
16 H. Toivonen, 'Sampling Large Databases for Association Rules,' In Proc. of the 22nd VLDB Conference, Bombay, India, 1996
17 R. Agrawal, T. Imielenski, and A. Swami, 'Mining Association Rules in Large Databases,' In Proc. of ACM SIGMOD Conference on Management of Data, Washington D.C., May, 1993