DOI QR코드

DOI QR Code

Mining Frequent Closed Sequences using a Bitmap Representation

비트맵을 사용한 닫힌 빈발 시퀀스 마이닝

  • 김형근 (강원대학교대학원 컴퓨터정보통신공학과) ;
  • 황환규 (강원대학교전지전자정보통신공학부)
  • Published : 2005.12.01

Abstract

Sequential pattern mining finds all of the frequent sequences satisfying a minimum support threshold in a large database. However, when mining long frequent sequences, or when using very low support thresholds, the performance of currently reported algorithms often degrades dramatically. In this paper, we propose a novel sequential pattern algorithm using only closed frequent sequences which are small subset of very large frequent sequences. Our algorithm generates the candidate sequences by depth-first search strategy in order to effectively prune. using bitmap representation of underlying databases, we can effectively calculate supports in terms of bit operations and prune sequences in much less time. Performance study shows that our algorithm outperforms the previous algorithms.

순차 패턴 탐사에 대한 연구는 대용량의 데이터베이스에서 사용자에 의해 주어지는 최소 지지도를 만족하는 빈발 시퀀스를 찾는 문제를 다룬다. 하지만 현재까지 이루어진 순차 패턴 탐사 방법은 빈발 시퀀스들의 길이가 길어지거나 최소 지지도가 상대적으로 낮게 주어진 상황에서는 생성되는 시퀀스가 기하급수적으로 많아져서 성능이 급격히 저하되는 문제점을 가지고 있다. 본 논문에서는 이 문제를 해결하기 위해서 모든 빈발 시퀀스의 정보를 포함하며 그 수가 현저히 적은 닫힌 빈발 시퀀스를 찾는 방법을 제안한다. 제안하는 알고리즘은 효율적으로 가지치기를 수행하기 위해서 깊이우선 탐색 방법으로 후보 시퀀스를 생성하고 데이터베이스를 비트맵으로 표현하여 비트 연산으로 지지도를 효율적으로 계산한다. 또한, 비트맵으로 표현된 시퀀스 특성을 이용하여 가지치기할 시퀀스를 적은 연산 비용으로 찾을 수 있다. 이런 장점을 통하여 제안한 방법이 지금까지 제안된 알고리즘보다 훨씬 빨리 닫힌 빈발 시퀀스를 찾는 것을 성능 실험을 통하여 확인하였다.

Keywords

References

  1. R. Agrawal, T. Imielenski, and A. Swami, 'Mining Association Rules in Large Databases,' In Proc. of ACM SIGMOD Conference on Management of Data, Washington D.C., May, 1993
  2. R. Agrawal and R. Srikant, 'Fast Algorithms for Mining Association Rules,' In Proc. of the 20th VLDB Conference, Santiago, Chile, Sept., 1994
  3. J.S. Park, M.-S. Chen, and P.S. Yu, 'An Effective Hash-Based Algorithm for Mining Association Rules,' In Proc. of ACM SIGMOD Conference on Management of Data, San Jose, California, May, 1995 https://doi.org/10.1145/223784.223813
  4. A. Savasere, E. Omiencinsky, and S. Navathe, 'An Efficient Algorithm for Mining Association Rules in Large Databases,' In Proc. of the 21st VLDB Conference, Zurich, Swizerland, 1995
  5. H. Toivonen, 'Sampling Large Databases for Association Rules,' In Proc. of the 22nd VLDB Conference, Bombay, India, 1996
  6. R. Agrawal and R. Srikant, 'Mining Sequential Patterns,' In Proc. of the 11th Int. Conf. on Data Engineering, Taipei, Taiwan, March, 1995
  7. R. Srikant and R. Agrawal, 'Mining Sequential Patterns : Generalizations and Performance Improvements', In EDBT, pp.3-17, Mar., 1996
  8. H. Mannila, H. Toivonen, and A.I. Verkamo, 'Discovering Frequent Episodes in Sequences,' In Proc, 1995 Int. Conf. Knowledge Discovery and Data Mining (KDD '95), Montreal, Canada, Aug., 1995
  9. M. Garofalakis, R Rastogi, and K. Shim, 'SPIRIT: Sequential Pattern Mining with Regular Expression Constraints.' In Proc. 1999 Int. Conf. Very Large Data Bases, Edinburgh, UK, Sept., 1999
  10. J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu, 'PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth,' In Proc. 2001 Int. Conf. Data Engineering, Heidelberg, Germany, April, 2001
  11. M.J.Zaki, 'SPADE: An Efficient Algorithm for Mining Frequent Sequences', Maching Learning, 2001 https://doi.org/10.1023/A:1007652502315
  12. J. Ayres, J.E. Gehrke, T. Yiu, and J. Flannick, 'Sequential Pattern Mining using a Bitmap Representation,' In Proc. of 2002 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases, Edmonton, Canada, July, 2002 https://doi.org/10.1145/775047.775109
  13. J. Pei, J. Han, and R. Mao, 'CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets,' In Proc. 2000 ACM SIGMOD Int. Workshop Data Mining and Knowledge Discovery (DKKD '00) Dallas, Texas, May, 2000
  14. D. Burdick, M. Calimlim, and J. Gehrke, 'MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases,' In Proc. 2001 Int. Conf. Data Engineering, Heidelberg, Germany, April, 2001 https://doi.org/10.1109/ICDE.2001.914857
  15. M.J. Zaki, and C. J. Hsiao, 'CHARM: An Efficient Algorithm for Closed Itemset Mining,' In Proc. 2002 SIAM Int. Conf. Data Engineering, Arlington, VA, April, 2002
  16. X. Yan, J. Han, and R. Afshar, 'CloSpan : Mining Closed Sequential Patterns in Large Datasets', In Proc. of 2003 SIAM Int. Conf. on Data Mining, May, 2003
  17. J. Wang and J. Han, 'BIDE : Efficient Mining of Frequent Closed Sequences', In Proc. 2004 Int. Conf. Data Engineering, Mar., 2004 https://doi.org/10.1109/ICDE.2004.1319986