DOI QR코드

DOI QR Code

Mining Frequent Sequential Patterns over Sequence Data Streams with a Gap-Constraint

순차 데이터 스트림에서 발생 간격 제한 조건을 활용한 빈발 순차 패턴 탐색

  • 장중혁 (대구대학교 컴퓨터IT공학부)
  • Received : 2010.05.31
  • Accepted : 2010.07.22
  • Published : 2010.09.30

Abstract

Sequential pattern mining is one of the essential data mining tasks, and it is widely used to analyze data generated in various application fields such as web-based applications, E-commerce, bioinformatics, and USN environments. Recently data generated in the application fields has been taking the form of continuous data streams rather than finite stored data sets. Considering the changes in the form of data, many researches have been actively performed to efficiently find sequential patterns over data streams. However, conventional researches focus on reducing processing time and memory usage in mining sequential patterns over a target data stream, so that a research on mining more interesting and useful sequential patterns that efficiently reflect the characteristics of the data stream has been attracting no attention. This paper proposes a mining method of sequential patterns over data streams with a gap constraint, which can help to find more interesting sequential patterns over the data streams. First, meanings of the gap for a sequential pattern and gap-constrained sequential patterns are defined, and subsequently a mining method for finding gap-constrained sequential patterns over a data stream is proposed.

순차 패턴 탐색은 데이터 마이닝의 주요 기법 중의 하나로서 웹기반 시스템, 전자상거래, 생물정보학 및 USN 환경 등과 같은 여러 컴퓨터 응용 분야에서 생성되는 데이터를 효율적으로 분석하기 위하여 널리 활용되고 있다. 한편 이들 응용 분야에서 생성되는 정보들은 근래들어 한정적인 데이터 집합이 아닌 구성요소가 지속적으로 생성되는 데이터 스트림 형태로 생성되고 있다. 이러한 상황을 고려하여 데이터 스트림에서 순차패턴 탐색에 대한 연구들도 활발히 진행되고 있다. 하지만 이전의 연구들은 주로 분석 대상 데이터 스트림에서 단순 순차패턴을 구하는 과정에서 마이닝 수행 시간이나 메모리 사용량 등을 줄이는데 초점을 맞추고 있으며, 따라서 해당 데이터 스트림의 특성을 효율적으로 표현할 수 있는 보다 중요하고 의미있는 패턴들을 탐색하기 위한 연구는 거의 진행되지 못하고 있다. 본 논문에서는 데이터 스트림에서 보다 의미있는 순차패턴을 탐색하기 위한 방법으로 구성요소의 발생 간격 제한 조건을 활용한 빈발 순차패턴 탐색 방법을 제안한다. 먼저 발생 간격 정의 기준 및 발생 간격제한 빈발 순차패턴의 개념을 제시하고, 이어서 데이터 스트림에서 발생 간격 제한 조건을 적용하여 빈발 순차패턴을 효율적으로 탐색할 수 있는 마이닝 방법을 제안한다.

Keywords

References

  1. J. Kang, J.F. Naughton, and S.D. Viglas, "Evaluating Window Joins over Unbounded Streams," in Proc. of the 19th Int'l Conf. on Data Engineering, pp. 341-352, 2003.
  2. J.H. Chang and W.S. Lee, "A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams," Journal of Information Science and Engineering, Vol. 20, pp. 753-762, 2004.
  3. G. Mao, X. Wu, X. Zhu, G. Chen, and C. Liu, "Mining Maximal Frequent Itemsets from Data Streams," Journal of Information Science, Vol. 33, pp. 251-262, 2007 https://doi.org/10.1177/0165551506068179
  4. J.X. Yu, Z. Chong, H. Lu, Z. Zhang, and A. Zhou, "A False Negative Approach to Mining Frequent Itemsets from High Speed Transactional Data Streams," Information Sciences, Vol. 176, pp. 1986-2015, 2006 https://doi.org/10.1016/j.ins.2005.11.003
  5. J.H. Chang and W.S. Lee, "Efficient Mining Method for Retrieving Sequential Patterns over Online Data Streams," Journal of Information Science, Vo. 31, pp. 420-432, 2005. https://doi.org/10.1177/0165551505055405
  6. Q. Huang and W. Ouyang, "Mining Sequential Patterns in Data Streams," in Proc. of the 6th Int'l Symposium on Neural Networks, pp. 865-874, 2009.
  7. C.-H. Lin, D.-Y. Chiu, Y.-H. Wu, and A.L.P. Chen, "Mining Frequent Itemsets from Data Streams with a Time-Sensitive Sliding Window," in Proc. of th 5th SIAM Int'l Conf. on Data Mining, pp. 68-79, 2005.
  8. E. Chen, H. Cao, Q. Li, and T. Qian, "Efficient Strategies for Tough Aggregate Constraintbased Sequential Pattern Mining," Information Sciences, 178(6), pp. 1498-1518, 2008. https://doi.org/10.1016/j.ins.2007.10.014
  9. X. Ji, J. Bailey, and G. Dong, "Mining Minimal Distinguishing Subsequence Patterns with Gap Constraints," Knowledge and Information Systems, 11(3), pp. 259-296, 2007. https://doi.org/10.1007/s10115-006-0038-2
  10. J. Pei, J. Han, and W. Wang, "Mining Sequential Patterns with Constraints in Large Databases," Proc. of the 2002 ACM Int'l Conf. on Information and Knowledge Management (CIKM '02), pp. 18-25, 2002.
  11. C. Luo and S.M. Chung, "Efficient Mining of Maximal Sequential Patterns Using Multiple Samples," Proc. of the 2005 SIAM Int'l Conf. on Data Mining (SDM '05), pp. 64-72, 2005.
  12. P. Tzvetkov, X. Yan, and J. Han, "TSP: Mining Top-K Closed Sequential Patterns," Knowledge and Information Systems, 7(4), pp. 438-457, 2005. https://doi.org/10.1007/s10115-004-0175-4
  13. J. Wang and J. Han, and C. Li, "Frequent Closed Sequence Mining without Candidate Maintenance," IEEE Transactions on Knowledge and Data Engineering, 19(8), pp. 1042-1056, 2007. https://doi.org/10.1109/TKDE.2007.1043
  14. X. Yan, J. Han, and R. Afshar, "CloSpan: Mining Closed Sequential Patterns in Large Datasets," Proc. of the 2003 SIAM Int'l Conf. on Data Mining (SDM '03), pp. 166-177, 2003.
  15. M. Garofalakis, J. Gehrke, and R. Rastogi, "Querying and Mining Data Streams: You Only Get One Look," in The tutorial notes of the 28th Int'l Conf. on Very Large Data Bases, 2002.
  16. Y.-L. Chen, M.-C. Chiang, and M.-T. Ko, "Discovering Fuzzy Time-Interval Sequential Patterns in Sequence Databases," IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, 35(5), pp. 959-972, 2005. https://doi.org/10.1109/TSMCB.2005.847741
  17. Y.-L. Chen and T. C.-H. Huang, "Discovering Time-Interval Sequential Patterns in Sequence Databases," Expert Systems with Applications, 25(1), pp. 343-354, 2003. https://doi.org/10.1016/S0957-4174(03)00075-7
  18. R. Agrawal and R. Srikant, "Mining Sequential Patterns," in Proc. of the 1995 Int'l Conf. on Data Engineering, pp. 3-14, 1995.