DOI QR코드

DOI QR Code

Discovering Frequent Itemsets Reflected User Characteristics Using Weighted Batch based on Data Stream

스트림 데이터 환경에서 배치 가중치를 이용하여 사용자 특성을 반영한 빈발항목 집합 탐사

  • 서복일 (전남대학교 전자컴퓨터공학부) ;
  • 김재인 (전남대학교 전자컴퓨터공학부) ;
  • 황부현 (전남대학교 전자컴퓨터공학부)
  • Received : 2010.12.06
  • Accepted : 2011.01.03
  • Published : 2011.01.28

Abstract

It is difficult to discover frequent itemsets based on whole data from data stream since data stream has the characteristics of infinity and continuity. Therefore, a specialized data mining method, which reflects the properties of data and the requirement of users, is required. In this paper, we propose the method of FIMWB discovering the frequent itemsets which are reflecting the property that the recent events are more important than old events. Data stream is splitted into batches according to the given time interval. Our method gives a weighted value to each batch. It reflects user's interestedness for recent events. FP-Digraph discovers the frequent itemsets by using the result of FIMWB. Experimental result shows that FIMWB can reduce the generation of useless items and FP-Digraph method shows that it is suitable for real-time environment in comparison to a method based on a tree(FP-Tree).

스트림데이터는 무한하고 연속적인 특성을 지니고 있기 때문에 전체 데이터를 기반으로 빈발 항목 집합을 탐사하는 것은 어렵다. 이 때문에 데이터의 특성과 사용자의 특성을 반영한 특수한 데이터마이닝 방법이 필요하다. 이 논문에서는 사용자가 최근에 발생한 데이터에 더 많은 관심이 있다는 특성을 반영하여 빈발 항목을 탐사하는 FIMWB 방법을 제안한다. FIMWB는 과거 데이터의 발생 시점과 현재 시점과의 시간 간격에 따라 가변적인 가중치를 배치에 부여하여 최신 데이터에 더 많은 관심과 중요성을 반영한다. FP-Digraph는 FIMWB를 통해 탐사된 빈발 항목으로 그래프를 구성하여 빈발 항목 집합을 탐사한다. 실험 결과로 FIMWB 방법이 불필요한 항목의 생성을 감소시키고 트리기반(FP-Tree)의 빈발 항목 집합 탐사에 비해 제안하는 FP-Digraph 방법이 스트림 데이터 환경에 더 적합함을 알 수 있다.

Keywords

References

  1. R. Agrawal, T. Imielinski, and A. Swami, "Mining association rules between sets of items in large databases," pp.207-216, in Proc.ACM SIGMOD 1993.
  2. J. Chang and W. Lee, "A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams," Journal of Information Science and Engineering, Vol.20, No4, 2004(7).
  3. M. M. Gaber, "Mining data streams:a review," ACM SIGMOD record, Vol.34, No.2, pp.18-26, 2005. https://doi.org/10.1145/1083784.1083789
  4. Chowdhury Farhan Ahmed, Syed Khairuzzaman Tanbeer, and B. S. Jeong, "Efficient Mining of Weighted Frequent Patterns Over Data Streams," 11th IEEE International Conference on High Performance Computing and Communications, 2009(6).
  5. C. F.ahmed, S. K. Tanbeer, and B. S. Jeong, "Efficient Mining of Weighted Frequent Patterns Over Data Streams," 2009 11th International Conference on High Performance Computing and Communications, pp.400-406, June, Seoul, Korea, 2009.
  6. Y. Kim, W. Kim, and U. Kim, "Mining Frequent Itemsets with Normalized Weight in Continuous Data Streams," Journal of Information Processing Systems, Vol.6, No.1, 2010(3). https://doi.org/10.3745/JIPS.2010.6.1.079
  7. Carson Kai-Sang Leung, and Boyu Hao, "Mining of Frequent Itemsets from Streams of Uncertain Data," IEEE International Conference on Data Engineering, 2010(5).
  8. Vivek Tiwari, Vipin Tiwari, Shailendra Gupta, and Renu Tiwari, "Association Rule Mining: A Graph Based Approach for Mining Frequent Itemsets", IEEE International Conference on Networking and Information Technology, 2010(7).
  9. J. Pei, J. Han, B. M. Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M. Hsu, "Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach," IEEE Transactions on Knowledge and Data Engineering, Vol.16, No.11, 2004. https://doi.org/10.1109/TKDE.2004.77
  10. G. Chen, X. Wu, and X. Zhu, "Mining Sequential Patterns Across Data Streams," Univ. of nd montComputerScience Technical Report(CS-05-04), 2005.
  11. LTC Bruce D.Caulkins, and J.Leem M.Wang, "A Dynamic Data Mining Technique for Intrusion Detection Systems," 43rd ACM Southeast Conference, March 18-20, 2005, Kennesaw, GA, USA.