DOI QR코드

DOI QR Code

Similarity Measurement with Interestingness Weight for Improving the Accuracy of Web Transaction Clustering

웹 트랜잭션 클러스터링의 정확성을 높이기 위한 흥미가중치 적용 유사도 비교방법

  • 강태호 (충북대학교 대학원 정보통신공학과) ;
  • 민영수 (충북대학교 대학원 정보통신공학) ;
  • 유재수 (충북대학교 전기전자 및 컴퓨터공학부)
  • Published : 2004.06.01

Abstract

Recently. many researches on the personalization of a web-site have been actively made. The web personalization predicts the sets of the most interesting URLs for each user through data mining approaches such as clustering techniques. Most existing methods using clustering techniques represented the web transactions as bit vectors that represent whether users visit a certain WRL or not to cluster web transactions. The similarity of the web transactions was decided according to the match degree of bit vectors. However, since the existing methods consider only whether users visit a certain URL or not, users' interestingness on the URL is excluded from clustering web transactions. That is, it is possible that the web transactions with different visit proposes or inclinations are classified into the same group. In this paper. we propose an enhanced transaction modeling with interestingness weight to solve such problems and a new similarity measuring method that exploits the proposed transaction modeling. It is shown through performance evaluation that our similarity measuring method improves the accuracy of the web transaction clustering over the existing method.

최근 들어 원 사이트 개인화(Web Personalization)에 관한 연구가 활발히 진행되고 있다. 웹 개인화는 클러스터링과 같은 데이터 마이닝 기법을 이용하여 가 사용자에게 가장 흥미를 가질만한 URL 집합을 예측하는 것이라 할 수 있다. 기존의 클러스터링을 이용한 방식에서는 웹 트랜잭션들을 웹 사이트의 각 URL들에 방문했는지 안했는지를 나타내는 비트 벡터(bit vector)로 표현하였다. 그리고 이들 비트 벡터의 방문 패턴이 일치하는 정도에 따라 유사성을 결정하였다. 하지간 이것은 유사한 성향을 가지는 웹 트랜잭션을 클러스터링 하는데 있어 사용자의 흥미를 배제하고 단순히 방문 여부만을 반영하게 되는 문제점이 발생하게 된다. 즉 방문 목적 또는 성향이 유사하지 않은 웹 트랜잭션들을 같은 그룹으로 분류할 가능성이 존재하게 된다 이에 본 논문에서는 기존의 비트 벡터를 이용한 트랜잭션 모델을 사용자의 흥미도(Interestingness)를 반영할 수 있도록 보완하여 새로운 점 트랜잭션 모델을 제시하고 흥미가중치를 적용한 유사도 비교방법을 제안한다. 그리고 성능평가를 통하여 제안만 방법이 기졸 방법에 비해 클러스터링의 정확성을 높임을 보인다.

Keywords

References

  1. R. Cooley and J. Srivastava, 'Automatic Personalization Based On Web Usage Mining,' Communications of the Association of Computing Machinery(CACM), pp.142-151, August, 2000 https://doi.org/10.1145/345124.345169
  2. R. Cooley, B. Mobasher and J. Srivastava, 'Data preparation for mining world wide web browsing pattern,' Knowledge and Information Systems, Vol.1, No.1, pp.5-32, 1999 https://doi.org/10.1007/BF03325089
  3. E-H. Han, G. Karypis, V. Kumar and B. Mobasher, 'Clustering based on association rule hypergraphs,' Data Mining and Knowledge Discovery(DMKD), 1997
  4. B. Mobasher, H. Dai and T. Luo, 'Discovery of Aggregate Usage Profiles for Web Personalization,' Proceedings of the Web Mining for E-Commerce Workshop(WEBKDD), August, 2000
  5. Alex G. Buchner, Maurice D. Mulvenna, 'Discovering internet marketing intelligence through online analytical Web usage mining,' ACM SIGMOD Record, Vol.27, No4, pp. 54-61, 1998 https://doi.org/10.1145/306101.306124
  6. ley, Pang-Ning Tan and Jaideep Srivastava, 'Discovery of Interesting Usage Patterns from Web Data,' World Wide Web Knowledge and Data mining(WEBKDD), pp.163-182, 1999
  7. Lin, S. A. Alvarez and C. Ruiz, 'Efficient adaptivesupport association rule mining for recommender systems,' Data Mining and knowledge Discovery(DMKD), 2002 https://doi.org/10.1023/A:1013284820704
  8. Rakesh Agrawal, Ramakrishnan Srikant, 'Fast Algorithms for Mining Association Rules,' Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pp.487-499, Sep., 1994
  9. B. Mobasher, H. dai and T. Luo, 'Improving the Effectinveness of Collaborative Filtering on Anonymous Web Usage Data,' Proceedings of the IJCAI 2001 Workshop on Intelligent Techniques for Web Personalization(ITWP01), August, 2001
  10. Shahabi, C., A. Zarkesh and J. Adibi, and V. Shah, 'Knowledge Discovery from Users Web-PageNavigation,' Research Issues in Data Engineering, 1997
  11. Feng Taoand and Murtagh, K., 'Towards knowledge discovery from WWW log data,' Proceedings of the The International Conference on Information Technology : Coding and Computing(ITCC), pp.302-307, 2000 https://doi.org/10.1109/ITCC.2000.844242
  12. Sanjay Kumar Madria, Sourav S. Bhowmick, Wee Keong Ng and Ee-Peng Lim, 'Research Issues in Web Data Mining,' Data Warehousing and Knowledge Discovery(DaWaK), pp.303-312, 1999
  13. F. Masseglia, P. Poncelet and M. Teisseire, 'Using Data Mining Techniques on Web Access Logs to Dynamically Improve Hypertext Structure,' In ACM SigWeb Letters, Vol.8, No.3, pp.13-19, October, 1999 https://doi.org/10.1145/951440.951443
  14. Mbasher, B., Cooley, R., Srivastaba, J., 'web mining : Information and Pattern Discovery on the World Wide Web,' In Procedings of the 9th IEEE International Conference on Tools with Artificial Intelligence(ICTAI '97), November, 1997 https://doi.org/10.1109/TAI.1997.632303
  15. B. Mobasher, H. Dai, and T.Luo, 'Web Usage and Content Mining for More Effective Personalization,' E-Commerce and Web Technologies(ECWeb), September, 2000
  16. T. Yan, M. Jacobsen, H. Garcia-Molina, and U. Dayal, 'From user access patterns to dynamic hypertext linking,' WWW5/Computer Networks, Vol.28, No.7-11, 1996 https://doi.org/10.1016/0169-7552(96)00051-7