An Efficient Approach for Single-Pass Mining of Web Traversal Sequences

단일 스캔을 통한 웹 방문 패턴의 탐색 기법

  • Received : 2010.08.19
  • Accepted : 2010.08.26
  • Published : 2010.10.15

Abstract

Web access sequence mining can discover the frequently accessed web pages pursued by users. Utility-based web access sequence mining handles non-binary occurrences of web pages and extracts more useful knowledge from web logs. However, the existing utility-based web access sequence mining approach considers web access sequences from the very beginning of web logs and therefore it is not suitable for mining data streams where the volume of data is huge and unbounded. At the same time, it cannot find the recent change of knowledge in data streams adaptively. The existing approach has many other limitations such as considering only forward references of web access sequences, suffers in the level-wise candidate generation-and-test methodology, needs several database scans, etc. In this paper, we propose a new approach for high utility web access sequence mining over data streams with a sliding window method. Our approach can not only handle large-scale data but also efficiently discover the recently generated information from data streams. Moreover, it can solve the other limitations of the existing algorithm over data streams. Extensive performance analyses show that our approach is very efficient and outperforms the existing algorithm.

인터넷 사용의 급증과 더불어 보다 편리한 인터넷 서비스를 위한 여러 연구가 활발히 진행되어 왔다. 웹 로그 데이터로부터 빈번하게 발생되는 웹 페이지들의 방문 시퀀스를 탐색하는 기법 역시 효과적인 웹 사이트를 설계하기 위한 목적으로 많이 연구되어 왔다. 그러나 기존의 방법들은 모두 여러 번의 데이터베이스 스캔을 필요로 하는 방법으로 지속적으로 생성되는 웹 로그 데이터로부터 빠르게 실시간적으로 웹 페이지 방문 시퀀스를 탐색하기에는 많은 어려움이 있었다. 또한 점진적(incremental)이고 대화형식(interactive)의 탐색 기법 역시 지속적으로 생성되는 웹 로그 데이터를 처리하기 위하여 필요한 기능들이다. 본 논문에서는 지속적으로 생성되는 웹 로그 데이터로부터 단일 스캔을 통하여 빈번히 발생하는 웹 페이지 방문 시퀀스를 점진적이고 대화 형식적인 방법으로 탐색하는 방법을 제안한다. 제안하는 방법은 WTS(web traversal sequence)-트리 구조를 사용하며 다양한 실험을 통하여 기존의 방법들에 비해 성능적으로 우수하고 효과적인 방범임을 증명한다.

Keywords

References

  1. Y.-S. Lee, S.-J. Yen, "Incremental and interactive mining of web traversal patterns," In Information Sciences, vol.178, pp.287-306, 2008. https://doi.org/10.1016/j.ins.2007.08.020
  2. Y.-S. Lee, S.-J. Yen, G.H. Tu and M.C. Hsieh, "Web usage mining: Integrating path traversal patterns and association rules," In International Conference on Informatics, Cybernetics, and Systems, pp.1464-1469, 2003.
  3. H.-F. Li, S.-Y. Lee and M.-K. Shen, "DSM-PLW: Single-pass mining of path traversal patterns over streaming web click-sequences," In Computer Networks, vol.50, pp.1474-1487, 2006. https://doi.org/10.1016/j.comnet.2005.10.018
  4. B. Mobasher, N. Jain, E.-H. Han, J. Srivastava, "Web mining: Pattern discovery from World Wide Web transactions," In Tech Rep: TR96-050, 1996.
  5. R. Cooley, B. Mobasher and J. Srivastava, "Web mining: Information and pattern discovery on the world wide web," In IEEE International Conference on Tools with Artificial Intelligence, pp.558-567, 1997.
  6. M. Spiliopoulou, and L. C. Faulstich, "Wum: A web utilization miner," In EDBT Workshop Web-DB98, Springer Verlag, pp.109-115, 1996.
  7. R. Agrawal, R. Srikant, "Mining Sequential Patterns," In IEEE International Conference on Data Engineering (ICDE), pp.3-14, 1995.
  8. J. Pei, J. Han, B. Mortazavi-asl and H. Zhu, "Mining access patterns efficiently from web logs," In Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD), pp.396-407, 2000.
  9. Syed Khairuzzaman Tanbeer, Chowdhury Farhan Ahmed, Byeong-Soo Jeong, and Young-Koo Lee, "Sliding Window-based Frequent Pattern Mining over Data Streams," Information Sciences, vol.179, Issue 22, pp.3843-3865, 2009. https://doi.org/10.1016/j.ins.2009.07.012
  10. http://www.almaden.ibm.com/cs/projects/iis/hdb/Projects/data mining/datasets/syndata.html
  11. C.I. Ezeife, Y. Lu, "Mining web log sequential patterns with position coded pre-order linked WAP-t ree," In Data Mining and Knowledge Discovery, vol.10, pp.53-87, 2005.
  12. S. Yang, J. Guo and Y. Zhu, "An efficient algorithm for web access pattern mining," In International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), pp.726-729, 2007.
  13. S.J. Yen, Y.S. Lee, C.W. Cho, "Efficient approach for the maintenance of path traversal patterns," In IEEE International Conference on e-Technology, e-Commerce and e-Service, pp.207-214, 2004.
  14. J. Srivastava, R. Cooley, M. Deshpande, and P.-N. Tan, "Web usage mining: discovery and applications of usage patterns from web data," In SIGKDD Explorations, vol.1, no.2, pp.12-23, 2000. https://doi.org/10.1145/846183.846188
  15. W. Wang and P. T. Cao-Thai, "Novel positioncoded methods for mining web access patterns," In IEEE International Conference on Intelligence and Security Informatics(ISI), pp.194-196, 2008.
  16. B. Zhou, S. C. Hui and A. Fong, "CS-Mine: An efficient wap-tree mining for web access patterns," In International Asia-Pacific Web Conference (APWeb), pp.523-532, 2004.