DOI QR코드

DOI QR Code

Mining High Utility Sequential Patterns Using Sequence Utility Lists

시퀀스 유틸리티 리스트를 사용하여 높은 유틸리티 순차 패턴 탐사 기법

  • Received : 2017.10.13
  • Accepted : 2017.12.18
  • Published : 2018.02.28

Abstract

High utility sequential pattern (HUSP) mining has been considered as an important research topic in data mining. Although some algorithms have been proposed for this topic, they incur the problem of producing a large search space for HUSPs. The tighter utility upper bound of a sequence can prune more unpromising patterns early in the search space. In this paper, we propose a sequence expected utility (SEU) as a new utility upper bound of each sequence, which is the maximum expected utility of a sequence and all its descendant sequences. A sequence utility list for each pattern is used as a new data structure to maintain essential information for mining HUSPs. We devise an algorithm, high sequence utility list-span (HSUL-Span), to identify HUSPs by employing SEU. Experimental results on both synthetic and real datasets from different domains show that HSUL-Span generates considerably less candidate patterns and outperforms other algorithms in terms of execution time.

높은 유틸리티 순차 패턴 탐사는 데이터 마이닝에서 중요한 연구 주제로 간주되고 있다. 이 주제에 대해 몇 개의 알고리즘들이 제안되었지만, 그것들은 높은 유틸리티 순차 패턴 탐사의 탐색 공간이 커지는 문제에 부딪히게 된다. 한 시퀀스의 더 엄격한 유틸리티 상한 값은 탐색 공간에서 초기에 유망하지 않은 패턴들을 더 가지치기할 수 있다. 본 논문에서 새로운 유틸리티 상한 값을 제안하는데, 그것은 한 시퀀스와 그 자손 시퀀스들의 최대 예상 유틸리티인 sequence expected utility (SEU)이다. 높은 유틸리티 순차 패턴들을 탐사하는데 필수적인 정보를 유지하기 위해 각 패턴에 대한 시퀀스 유틸리티 리스트를 새로운 자료구조로 사용한다. SEU를 활용하여 높은 유틸리티 순차 패턴들을 찾아내는 알고리즘인 High Sequence Utility List-Span (HSUL-Span)을 제안한다. 서로 다른 영역의 합성 데이터세트와 실제 데이터세트에 대한 실험 결과는 HSUL-Span이 상당히 적은 수의 후보 패턴들을 생성하고 실행 시간 면에서 다른 알고리즘들보다 우수한 것을 보여준다.

Keywords

References

  1. R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules," in Proceedings of the 20th Very Large Data Base Conference, Santiago, pp.487-499, 1994.
  2. J. S. Park, M.-S. Chen, and P. S. Yu, "An effective hash-based algorithm for mining association rules," in Proceedings of the 1995 ACM SIGMOD international Conference on Management of Data, San Jose, pp.175-186, 1995.
  3. J. Han, J. Pei, and Y. Yin, "Mining Frequent Patterns without Candidate Generation," in Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, pp.1-12, 2000.
  4. R. Agrawal and R. Srikant, "Mining sequential patterns," in Proceedings of the Eleventh International Conference on Data Engineering, Taipei, pp.3-14, 1995.
  5. J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M. C. Hsu, "PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth," in Proceedings 17th International Conference on Data Engineering, Heidelberg, pp.215-224, 2001.
  6. N. R. Mabroukeh and C. I. Ezeife, "A Taxonomy of Sequential Pattern Mining Algorithms," ACM Computing Surveys, Vol.43, No.1, Article 3, 2010.
  7. Y. Liu, W. Liao, and A. Choudhary, "A two-phase algorithm for fast discovery of high utility itemsets," in Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining, Hanoi, pp.689-695, 2005.
  8. B.-S. Jeong, C. F. Ahmed, I. Lee, and H. Yong, "High utility pattern mining using a prefix-tree," Journal of KIISE: Database, Vol.36, No.5, pp.341-351, .
  9. C. F. Ahmed, S. K. Tanbeer, B.-S. Jeong, and Y.-K. Lee, "Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases," IEEE Transactions on Knowledge and Data Engineering, Vol.21, No.12, pp.1708-1721, 2009. https://doi.org/10.1109/TKDE.2009.46
  10. V. S. Tseng, C.-W. Wu, B.-E. Shie, and P. S. Yu, "UP-Growth: an efficient algorithm for high utility itemset mining," in Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, pp.253-262, 2010.
  11. S. Lee and J. S. Park, "High Utility Itemset Mining Using Transaction Utility of Itemsets," KIPS Transactions on Software and Data Engineering, Vol.4, No.11, pp.499-508, 2015. https://doi.org/10.3745/KTSDE.2015.4.11.499
  12. C. F. Ahmed, S. K. Tanbeer, and B. Jeong. "A novel approach for mining high-utility sequential patterns in sequence databases," Electronics and Telecommunications Research Institute Journal, Vol.32, No.5, pp.676-686, 2010.
  13. J. Yin, Z. Zheng, and L. Cao. "USpan: An efficient algorithm for mining high utility sequential patterns," in Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, pp.660-668, 2012.
  14. O. K. Alkan and P. Karagoz, "CRoM and HuspExt: Improving Efficiency of High Utility Sequential Pattern Extraction," IEEE Transactions on Knowledge and Data Engineering, Vol.27, No.10, pp.2645-2657, 2015. https://doi.org/10.1109/TKDE.2015.2420557
  15. J.-Z. Wang, J.-L. Huang, and Y.-C. Chen, "On efficiently mining high utility sequential patterns," Knowledge and Information Systems, Vol.49, Issue 2, pp.597-627, 2016. https://doi.org/10.1007/s10115-015-0914-8
  16. M. Zihayat, C.-W. Wu , A. An and V. S. Tseng, "Efficiently Mining High Utility Sequential Patterns in Static and Streaming Data," Intelligent Data Analysis, Vol.21, No.S1, pp.S103-S135, 2017. https://doi.org/10.3233/IDA-170874
  17. P. Fournier-Viger, An Open-Source Data Mining Library [Internet], http://www.philippe-fournier-viger.com/spmf/in dex.php, 2017.