DOI QR코드

DOI QR Code

Implementation of Subsequence Mapping Method for Sequential Pattern Mining

  • Trang Nguyen Thu (Database and Bioinformatics Laboratory, Chungbuk National University) ;
  • Lee Bum-Ju (Database and Bioinformatics Laboratory, Chungbuk National University) ;
  • Lee Heon-Gyu (Database and Bioinformatics Laboratory, Chungbuk National University) ;
  • Park Jeong-Seok (College of Electrical, Electronic & Information Engineering, ChungJu National University) ;
  • Ryu Keun-Ho (Database and Bioinformatics Laboratory, Chungbuk National University)
  • Published : 2006.10.31

Abstract

Sequential Pattern Mining is the mining approach which addresses the problem of discovering the existent maximal frequent sequences in a given databases. In the daily and scientific life, sequential data are available and used everywhere based on their representative forms as text, weather data, satellite data streams, business transactions, telecommunications records, experimental runs, DNA sequences, histories of medical records, etc. Discovering sequential patterns can assist user or scientist on predicting coming activities, interpreting recurring phenomena or extracting similarities. For the sake of that purpose, the core of sequential pattern mining is finding the frequent sequence which is contained frequently in all data sequences. Beside the discovery of frequent itemsets, sequential pattern mining requires the arrangement of those itemsets in sequences and the discovery of which of those are frequent. So before mining sequences, the main task is checking if one sequence is a subsequence of another sequence in the database. In this paper, we implement the subsequence matching method as the preprocessing step for sequential pattern mining. Matched sequences in our implementation are the normalized sequences as the form of number chain. The result which is given by this method is the review of matching information between input mapped sequences.

Keywords

References

  1. R. Agrawal and R. Srikant, 1995. Mining sequential Patterns, In Proc. of Intl. Conf. on Data Engineering, p. 3-14, Taipei, Taiwan
  2. Srikant, R. and Agrawal, R., 1996 Mining sequential Patterns: Generalized and Performance improvements. Proceedings of International Conference on Extending Database Technology, p. 3-17
  3. Zaki M, 1998. Efficient Enumeartion of Frequent Sequences. ACM Conf. on Information Knowledge Management, p. 68-75
  4. J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M. -C. Hsu, 2000. FreeSpan: Frequent Pattern-Projected Sequential Pattern Mining. Proc. 2000 ACM SIGKDD Int'l Conf. Knowledge Discovery in Databases (KDD '00) pp. 355-359
  5. J. Han, J. Pei, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M. -C. Hsu, 2001. PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. Proc. 2001 Int'l Conf. Data Eng. (ICDE '01), pp. 215-224
  6. Gira Narasimhan, Changsong Bu, Yuan Gao, Xuning Wang, Ning Xu, and Kalai Mathee, 2002. Mining Protein Sequences for Motifs. Journal of Computational Biology, 9(5): 707-720 https://doi.org/10.1089/106652702761034145
  7. Margaret H. Dunham, Southern Methodist University. Data mining-Introductory and Advanced Topics
  8. Jason T. L. Wang, Mohammed J. Zaki, Hannu T. T. Toivonen, Dennis Shasha, Data Mining in Bioinformatics