Browse > Article
http://dx.doi.org/10.3745/KIPSTD.2011.18D.2.081

Sequential Pattern Mining with Optimization Calling MapReduce Function on MapReduce Framework  

Kim, Jin-Hyun (서울대학교 전기컴퓨터공학부)
Shim, Kyu-Seok (서울대학교 전기컴퓨터공학부)
Abstract
Sequential pattern mining that determines frequent patterns appearing in a given set of sequences is an important data mining problem with broad applications. For example, sequential pattern mining can find the web access patterns, customer's purchase patterns and DNA sequences related with specific disease. In this paper, we develop the sequential pattern mining algorithms using MapReduce framework. Our algorithms distribute input data to several machines and find frequent sequential patterns in parallel. With synthetic data sets, we did a comprehensive performance study with varying various parameters. Our experimental results show that linear speed up can be achieved through our algorithms with increasing the number of used machines.
Keywords
Data Mining; Sequential Pattern Mining; MapReduce; Hadoop; Parallel Processing;
Citations & Related Records
연도 인용수 순위
  • Reference
1 L. Hongyan, L. Fangzhou, C. Yunjue, "New approach for the sequential pattern mining of high-dimensional sequence databases", Decision Support System, 2010.
2 Illimine, http://illimine.cs.uiuc.edu/
3 J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto. "PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth," In Proc. of 17th International Conference on Data Engineering, 2001.
4 R. Agrawal, R. Srikank, "Mining Sequence Patterns," In Proc. of International Conference on Data Engineering, 1995.
5 R. Agrawal, R. Srikant, "Fast Algorithms for Mining Association Rules," In Proc. of International Conference on Very Large Data Bases, 1994.
6 R. Agrawal, R. Srikant, "Mining Sequential Patterns: Generalizations and Performance Improvement," In Proc. of the 5th International Conference on Extending Database Technology, 1996.
7 J. Wang, J. Han, "BIDE: efficient mining of frequent closed sequences", In Proc. of the 20th IEEE International Conference on Data Engineering(ICDE), 2004.   DOI
8 X. Yan, J. Han, R. Afshar, "CloSpan: mining closed sequential patterns in large datasets", In Proc. of the 3rd SIAM International Conference on Data Mining(SDB), 2004.
9 J. Dean, S. Ghemawat, "MapReduce: Simplfied Data Processing on Large Clusters," In Proc. of the 6th OSDI, 2004.
10 H. Liu, J. Han, D. Xin, Z. Shao, "Mining interesting patterns from very high dimensional data: a top-down row enumeration approach", In Proc. of the 6th SIAM International Conference on Data Mining(SDM), 2006.
11 Hadoop, "http://hadoop.apache.org/core/"