Mining Frequent Closed Sequences using a Bitmap Representation

Kim Hyung-Geun;Whang Whan-Kyu;

doi:10.3745/KIPSTD.2005.12D.6.807

The KIPS Transactions:PartD (정보처리학회논문지D)

Volume 12D Issue 6 Serial No. 102
/
Pages.807-816
/
2005
/
1598-2866(pISSN)

Korea Information Processing Society (한국정보처리학회)

DOI QR Code

Mining Frequent Closed Sequences using a Bitmap Representation

비트맵을 사용한 닫힌 빈발 시퀀스 마이닝

김형근 (강원대학교대학원 컴퓨터정보통신공학과) ;
황환규 (강원대학교전지전자정보통신공학부)

Published : 2005.12.01

https://doi.org/10.3745/KIPSTD.2005.12D.6.807 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Sequential pattern mining finds all of the frequent sequences satisfying a minimum support threshold in a large database. However, when mining long frequent sequences, or when using very low support thresholds, the performance of currently reported algorithms often degrades dramatically. In this paper, we propose a novel sequential pattern algorithm using only closed frequent sequences which are small subset of very large frequent sequences. Our algorithm generates the candidate sequences by depth-first search strategy in order to effectively prune. using bitmap representation of underlying databases, we can effectively calculate supports in terms of bit operations and prune sequences in much less time. Performance study shows that our algorithm outperforms the previous algorithms.

순차 패턴 탐사에 대한 연구는 대용량의 데이터베이스에서 사용자에 의해 주어지는 최소 지지도를 만족하는 빈발 시퀀스를 찾는 문제를 다룬다. 하지만 현재까지 이루어진 순차 패턴 탐사 방법은 빈발 시퀀스들의 길이가 길어지거나 최소 지지도가 상대적으로 낮게 주어진 상황에서는 생성되는 시퀀스가 기하급수적으로 많아져서 성능이 급격히 저하되는 문제점을 가지고 있다. 본 논문에서는 이 문제를 해결하기 위해서 모든 빈발 시퀀스의 정보를 포함하며 그 수가 현저히 적은 닫힌 빈발 시퀀스를 찾는 방법을 제안한다. 제안하는 알고리즘은 효율적으로 가지치기를 수행하기 위해서 깊이우선 탐색 방법으로 후보 시퀀스를 생성하고 데이터베이스를 비트맵으로 표현하여 비트 연산으로 지지도를 효율적으로 계산한다. 또한, 비트맵으로 표현된 시퀀스 특성을 이용하여 가지치기할 시퀀스를 적은 연산 비용으로 찾을 수 있다. 이런 장점을 통하여 제안한 방법이 지금까지 제안된 알고리즘보다 훨씬 빨리 닫힌 빈발 시퀀스를 찾는 것을 성능 실험을 통하여 확인하였다.

Keywords

References

R. Agrawal, T. Imielenski, and A. Swami, 'Mining Association Rules in Large Databases,' In Proc. of ACM SIGMOD Conference on Management of Data, Washington D.C., May, 1993
R. Agrawal and R. Srikant, 'Fast Algorithms for Mining Association Rules,' In Proc. of the 20th VLDB Conference, Santiago, Chile, Sept., 1994
J.S. Park, M.-S. Chen, and P.S. Yu, 'An Effective Hash-Based Algorithm for Mining Association Rules,' In Proc. of ACM SIGMOD Conference on Management of Data, San Jose, California, May, 1995 https://doi.org/10.1145/223784.223813
A. Savasere, E. Omiencinsky, and S. Navathe, 'An Efficient Algorithm for Mining Association Rules in Large Databases,' In Proc. of the 21st VLDB Conference, Zurich, Swizerland, 1995
H. Toivonen, 'Sampling Large Databases for Association Rules,' In Proc. of the 22nd VLDB Conference, Bombay, India, 1996
R. Agrawal and R. Srikant, 'Mining Sequential Patterns,' In Proc. of the 11th Int. Conf. on Data Engineering, Taipei, Taiwan, March, 1995
R. Srikant and R. Agrawal, 'Mining Sequential Patterns : Generalizations and Performance Improvements', In EDBT, pp.3-17, Mar., 1996
H. Mannila, H. Toivonen, and A.I. Verkamo, 'Discovering Frequent Episodes in Sequences,' In Proc, 1995 Int. Conf. Knowledge Discovery and Data Mining (KDD '95), Montreal, Canada, Aug., 1995
M. Garofalakis, R Rastogi, and K. Shim, 'SPIRIT: Sequential Pattern Mining with Regular Expression Constraints.' In Proc. 1999 Int. Conf. Very Large Data Bases, Edinburgh, UK, Sept., 1999
J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu, 'PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth,' In Proc. 2001 Int. Conf. Data Engineering, Heidelberg, Germany, April, 2001
M.J.Zaki, 'SPADE: An Efficient Algorithm for Mining Frequent Sequences', Maching Learning, 2001 https://doi.org/10.1023/A:1007652502315
J. Ayres, J.E. Gehrke, T. Yiu, and J. Flannick, 'Sequential Pattern Mining using a Bitmap Representation,' In Proc. of 2002 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases, Edmonton, Canada, July, 2002 https://doi.org/10.1145/775047.775109
J. Pei, J. Han, and R. Mao, 'CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets,' In Proc. 2000 ACM SIGMOD Int. Workshop Data Mining and Knowledge Discovery (DKKD '00) Dallas, Texas, May, 2000
D. Burdick, M. Calimlim, and J. Gehrke, 'MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases,' In Proc. 2001 Int. Conf. Data Engineering, Heidelberg, Germany, April, 2001 https://doi.org/10.1109/ICDE.2001.914857
M.J. Zaki, and C. J. Hsiao, 'CHARM: An Efficient Algorithm for Closed Itemset Mining,' In Proc. 2002 SIAM Int. Conf. Data Engineering, Arlington, VA, April, 2002
X. Yan, J. Han, and R. Afshar, 'CloSpan : Mining Closed Sequential Patterns in Large Datasets', In Proc. of 2003 SIAM Int. Conf. on Data Mining, May, 2003
J. Wang and J. Han, 'BIDE : Efficient Mining of Frequent Closed Sequences', In Proc. 2004 Int. Conf. Data Engineering, Mar., 2004 https://doi.org/10.1109/ICDE.2004.1319986

The KIPS Transactions:PartD (정보처리학회논문지D)

Mining Frequent Closed Sequences using a Bitmap Representation

비트맵을 사용한 닫힌 빈발 시퀀스 마이닝

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)