DOI QR코드

DOI QR Code

슬라이딩 윈도우 기반의 스트림 하이 유틸리티 패턴 마이닝 기법 성능분석

Performance Analysis of Siding Window based Stream High Utility Pattern Mining Methods

  • Ryang, Heungmo (Dept. of Computer Engineering, Sejong University) ;
  • Yun, Unil (Dept. of Computer Engineering, Sejong University)
  • 투고 : 2016.08.10
  • 심사 : 2016.11.08
  • 발행 : 2016.12.31

초록

최근 무선 센서 네트워크, 사물 인터넷, 소셜 네트워크 서비스와 같은 다양한 응용 분야에서 대용량 스트림 데이터가 실시간으로 생성되고 있으며, 효율적인 기법을 통해 처리 및 분석하여 유용한 정보를 찾아내고, 이를 의사 결정을 위해 사용할 수 있도록 하는 것은 중요한 이슈 중에 하나이다. 스트림 데이터는 끊임없이 빠른 속도로 생성되므로 최소한의 접근을 통해 처리해야 하며, 신속한 저전력 처리를 필요로 하는 자원이 제한된 환경에서 분석될 수 있도록 적합한 기법이 요구된다. 이러한 문제를 해결하기 위해, 슬라이딩 윈도우 개념이 제안되어 연구되고 있다. 한편, 대용량 데이터로부터 의미 있는 정보를 찾아내기 위한 데이터 마이닝 기법 중에 하나인 패턴 마이닝은 중요 정보를 패턴 형태로 추출한다. 전통적인 빈발 패턴 마이닝은 이진 데이터베이스를 대상으로 하고 모든 아이템을 동일한 중요도로 고려함으로써 데이터 마이닝 분야에서 중요한 역할을 수행해 왔지만, 실제 데이터 특성을 반영하지 못하는 단점을 지닌다. 하이 유틸리티 패턴 마이닝은 비 이진 데이터베이스로부터 상대적인 아이템 중요도를 반영하여 더욱 의미 있는 정보를 찾아내기 위해 제안되었다. 정적 데이터를 대상으로 하는 하이 유틸리티 패턴 마이닝 기법은 그러나 스트림 데이터 처리에 적합하지 못하다. 제한된 환경에서 스트림 데이터의 특성을 반영하고 효율적으로 처리하여 중요한 정보를 찾아내기 위해 슬라이딩 윈도우 기반의 접근법이 제안되었다. 본 논문은 슬라이딩 윈도우 기반 하이 유틸리티 패턴 마이닝 기법들의 성능을 평가하고 분석하여 해당 기법들의 특성 및 발전 방향을 고찰한다.

Recently, huge stream data have been generated in real time from various applications such as wireless sensor networks, Internet of Things services, and social network services. For this reason, to develop an efficient method have become one of significant issues in order to discover useful information from such data by processing and analyzing them and employing the information for better decision making. Since stream data are generated continuously and rapidly, there is a need to deal with them through the minimum access. In addition, an appropriate method is required to analyze stream data in resource limited environments where fast processing with low power consumption is necessary. To address this issue, the sliding window model has been proposed and researched. Meanwhile, one of data mining techniques for finding meaningful information from huge data, pattern mining extracts such information in pattern forms. Frequency-based traditional pattern mining can process only binary databases and treats items in the databases with the same importance. As a result, frequent pattern mining has a disadvantage that cannot reflect characteristics of real databases although it has played an essential role in the data mining field. From this aspect, high utility pattern mining has suggested for discovering more meaningful information from non-binary databases with the consideration of the characteristics and relative importance of items. General high utility pattern mining methods for static databases, however, are not suitable for handling stream data. To address this issue, sliding window based high utility pattern mining has been proposed for finding significant information from stream data in resource limited environments by considering their characteristics and processing them efficiently. In this paper, we conduct various experiments with datasets for performance evaluation of sliding window based high utility pattern mining algorithms and analyze experimental results, through which we study their characteristics and direction of improvement.

키워드

참고문헌

  1. R. Agrawal, R. Srikant, "Fast algorithms for mining association rules", in Proc. of the 20th International Conference on Very Large Data Bases, 1994, pp. 487-499.
  2. C.F. Ahmed, S.K. Tanbeer, B.S. Jeong, H.J. Choi, "Interactive mining of high utility patterns over data streams", Expert Systems with Applications, vol. 39, no. 15, 2012, pp. 11979-11991. https://doi.org/10.1016/j.eswa.2012.03.062
  3. C.F. Ahmed, S.K. Tanbeer, B.S. Jeong, Y.K. Lee, "Efficient tree structures for high utility pattern mining in incremental databases", IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 12, 2009, pp. 1708-1721. https://doi.org/10.1109/TKDE.2009.46
  4. J. Han, J. Pei, Y. Yin, and R. Mao, "Mining frequent patterns without Candidate Generation: A frequent-Pattern Tree Approach", Data Mining and Knowledge Discovery, Vol.8, No.1, pp.53-87, 2004. http://dx.doi.org/10.1023/B:DAMI.0000005258.31418.83
  5. H.F. Li, S.Y. Lee, "Mining frequent itemsets over data streams using efficient window sliding techniques", Expert Systems with Applications, vol. 36, no. 2, 2009, pp. 1466-1477. https://doi.org/10.1016/j.eswa.2007.11.061
  6. Y. Liu, W.K. Liao, A.N. Choudhary, "A two-phase algorithm for fast discovery of high utility itemsets", in Proc. of the 9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, 2005, pp. 689-695.
  7. J. Pisharath, Y. Liu, B. Ozisikyilmaz, R. Narayanan, W.K. Liao, A. Choudhary, G. Memik, NU-MineBench version 2.0 dataset and technical report, http://cucis.ece.northwestern.edu/projects/DMS/MineBench.html
  8. V.S. Tseng, B.-E. Shie, C.-W. Wu, and P.S. Yu, "Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases", IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 8, 2013, pp. 1772-1786. http://dx.doi.org/10.1109/TKDE.2012.59
  9. U. Yun, H. Ryang, and K. Ryu, "High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates", Expert Systems with Applications, Vol. 41, No. 8, pp. 3861-3878, 2014. http://dx.doi.org/10.1016/j.eswa.2013.11.038
  10. G. Lee and U. Yun, "Analysis and Performance Evaluation of Pattern Condensing Techniques used in Representative Pattern Mining", Journal of Internet Computing and Services, Vol. 16, No. 2, pp. 77-83, 2015. http://dx.doi.org/10.7472/jksii.2015.16.2.77
  11. G. Pyun and U. Yun, "Performance evaluation of approximate pattern mining based on probabilistic technique", Journal of Internet Computing and Services, Vol. 14, No. 1, pp. 63-69, 2013. http://dx.doi.org/10.7472/jksii.2013.14.63

피인용 문헌

  1. High Utility Itemset Mining using Utility-List Structure vol.21, pp.3, 2020, https://doi.org/10.9728/dcs.2020.21.3.579