DOI QR코드

DOI QR Code

데이터 스트림 마이닝에서 양방향 감쇠 기법을 활용한 고관심 정보 탐색

Mining highly attention itemsets using a two-way decay mechanism in data stream mining

  • 장중혁 (대구대학교 컴퓨터IT공학부)
  • 투고 : 2014.12.12
  • 심사 : 2015.02.25
  • 발행 : 2015.04.30

초록

데이터 스트림 마이닝에서 대부분의 정보 중요성 차별화 기법들은 오래된 정보에 비해 최근에 발생한 정보에 보다 큰 가중치를 부여한다. 하지만, 오래 전에 발생한 정보 중에도 매우 중요한 의미를 갖는 정보들이 존재하기도 한다. 예를 들어, 도소매 상점에서 과거에는 단골 고객이었으나 일정 기간 동안 방문하지 않은 경우, 해당 고객의 구매 기록 등이 포함된 오래된 정보들은 집중 마케팅을 통한 판매실적 증대에 매우 중요한 자료가 될 수 있다. 본 논문에서는 하나의 데이터 스트림에서 최근에는 자주 발생되지 않으나 과거에 빈번히 발생했던 것으로서 관심도가 큰 항목집합을 의미하는 고관심 정보 HAI(Highly Attention Itemsets)를 정의하고, 이를 효율적으로 탐색하기 위한 양방향 감쇠 기법 및 데이터 스트림 마이닝 기법을 제안한다.

In most techniques of information differentiating for data stream mining, they give larger weight to the information generated in recent compared to the old information. However, there can be important one among the old information. For example, in case of a person was a regular customer in a retail store but has not come to the store in recent, old information with the shopping record of the person can be importantly used in a target marketing for increasing sales. In this paper, highly attention itemsets(HAI) are defined, which mean the itemsets generated in the past frequently but not generated in recent. In addition, a twao-way decay mechanism and a data stream mining method for finding HAI are proposed.

키워드

참고문헌

  1. S.K. Tanbeer, C.F. Ahmed, B.-S. Jeong, and Y.-K. Lee, "Sliding window-based frequent pattern mining over data streams," Information Sciences, 179(22), pp. 3843-3865, 2009. https://doi.org/10.1016/j.ins.2009.07.012
  2. J.H. Chang and W.S. Lee, "Finding Recently Frequent Itemsets Adaptively over Online Transactional Data Streams," Information Systems, 31(8), pp. 849-869, 2006. https://doi.org/10.1016/j.is.2005.04.001
  3. Q. Huang and W. Ouyang, "Mining Sequential Patterns in Data Streams," in Proc. of the 6th Int'l Symposium on Neural Networks, pp.865-874, 2009.
  4. H.T. Lam and T. Calders, "Mining top-K frequent items in a data stream with flexible sliding windows," in Proc. of the 16th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining, pp.283-292, 2010.
  5. C.C. Aggarwal and P.S. Yu, "A framework for clustering uncertain data streams," in Proc. of the Int'l Conf. on Data Engineering, pp. 150-159, 2008.
  6. C.-W. Li and K.-F. Jea, "An adaptive approximation method to discover frequent itemsets over sliding-window-based data streams," Expert Systems with Applications, 38(10), pp. 13386-13404, 2011. https://doi.org/10.1016/j.eswa.2011.04.167
  7. H.-F. Li and S.-Y. Lee, "Mining frequent itemsets over data streams using efficient window sliding techniques," Expert Systems with Applications, 36(2), pp. 1466-1477, 2009. https://doi.org/10.1016/j.eswa.2007.11.061
  8. H.-F. Li, H.-Y. Huang, Y.-C. Chen, Y.-J. Liu, and S.-Y. Lee, "Fast and memory efficient mining of high utility itemsets in data streams," in Proc. of the Int'l Conf. on Data Engineering, pp. 881-886, 2008.
  9. B.-E. Shie, P.S. Yu, V. S.Tseng, "Efficient algorithms for mining maximal high utility itemsets from data streams with different models," Expert Systems with Applications, 39(17), pp. 12947-12960, 2012. https://doi.org/10.1016/j.eswa.2012.05.035
  10. J.H. Chang and W.S. Lee, "Efficient Mining Method for Retrieving Sequential Patterns over Online Data Streams," Journal of Information Science, 31(2), pp. 420-432, 2005. https://doi.org/10.1177/0165551505055405
  11. N. Gabsi, F. Clerot, and G. Hebrail, "Efficient trade-off between speed processing and accuracy in summarizing data stream," in Proc. of the 14th PAKDD, pp.343-353, 2010.
  12. R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules," in Proc. of the 20th International Conf. on Very Large Data Bases, pp. 487-499, 1994.