DOI QR코드

DOI QR Code

Predicting Interesting Web Pages by SVM and Logit-regression

SVM과 로짓회귀분석을 이용한 흥미있는 웹페이지 예측

  • Received : 2015.01.07
  • Accepted : 2015.02.24
  • Published : 2015.03.31

Abstract

Automated detection of interesting web pages could be used in many different application domains. Determining a user's interesting web pages can be performed implicitly by observing the user's behavior. The task of distinguishing interesting web pages belongs to a classification problem, and we choose white box learning methods (fixed effect logit regression and support vector machine) to test empirically. The result indicated that (1) fixed effect logit regression, fixed effect SVMs with both polynomial and radial basis kernels showed higher performance than the linear kernel model, (2) a personalization is a critical issue for improving the performance of a model, (3) when asking a user explicit grading of web pages, the scale could be as simple as yes/no answer, (4) every second the duration in a web page increases, the ratio of the probability to be interesting increased 1.004 times, but the number of scrollbar clicks (p=0.56) and the number of mouse clicks (p=0.36) did not have statistically significant relations with the interest.

흥미 있는 웹페이지의 자동화된 탐색은 다양한 응용 분야에 활용될 수 있다. 웹페이지에 대한 사용자의 흥미는 판단하는 것은 사용자의 행동을 관찰함으로 자동화가 가능하다. 흥미 있는 웹페이지를 구분하는 작업은 판별 문제에 속하며, 우리는 실증을 위해 화이트 박스의 학습 방법(로짓회귀분석, 지지기반학습)을 선택한다. 실험 결과는 다음을 나타내었다. (1) 고정효과 로짓회귀분석, polynomial 과 radial 커널을 이용한 고정효과 지지기반학습은 선형 커널보다 높은 성능을 보였다. (2) 개인화가 모델 성능을 향상시킴에 있어 주요한 이슈이다. (3) 사용자에게 웹페이지에 대항 흥미를 물을 때, 구간은 단순히 예/아니 도 충분할 수 있다. (4) 웹페이지에 머문 기간이 매초 증가할 때마다 성공확률은 1.004배 증가하며, 하지만 스크롤바 클릭 수 (p=0.56) 와 마우스 클릭 수 (p=0.36) 지표는 흥미와 통계적으로 유의한 관계를 가지지 않았다.

Keywords

References

  1. C. Shahabi, and F. Banaei-Kashani, "Efficient and Anonymous Web-Usage Mining for Web Personalization," INFORMS Journal on Computing-Special Issue on Data Mining, Vol. 15, No. 2, Apr. 2003.
  2. V. Kumar, "Support Vector Machines-Optimization Based Theory," Algorithms, and Extensions, Chapman & Hall/CRC Press, Dec. 2012.
  3. L. A. Granka, T. Joachims, and G. Gay, "Eye-tracking Analysis of User Behavior in WWW Search," Proc. 27th annual international conference on Research and development in information retrieval, July 2004.
  4. K. Jung, "Modeling Web User Interest with Implicit Indicators," Master Thesis, Florida Institute of Technology, Dec. 2001.
  5. M. Claypool, P. Le, M. Wased, and D. Brown, "Implicit Interest Indicators," Proc. 6th international conference on Intelligent User Interfaces, pp. 33-40, Jan. 2001.
  6. P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl, "GroupLens: An open Architecture for Collaborative Filtering of Netnews," In Richard K. Faruta and Christine M. Neuwirth, editors, Proc. Conference on Computer Supported Cooperative Work, ACM, pp. 175-186, Oct. 1994.
  7. H. Liberman, "Letizia: An Agent that Assists Web Browsing," Proc. 14th International Joint Conference on Artificial Intelligence, Montreal, Aug. 1995.
  8. J. Kim, D. W. Oard, and K. Romanik, "Using Implicit Feedback for User Modeling in Internet and Intranet Searching," Technical Report, College of Library and Information Services, University of Maryland, May 2001.
  9. M. Pazzani, and D. Billsus, "Adaptive Web Site Agents," Proc. 3rd International Conference, Autonomous Agents, Seattle, Washington, May 1999.
  10. S. Zahoor, M. Bedekar, P. K. Kosamkar, "User Implicit Interest Indicators learned from the Browser on the Client Side," Pro. of the 2014 International Conference on Information and Communication Technology for Competitive Strategies, Nov. 2014.
  11. N. Zumel and J. Mount, "Practical Data Science with R," Manning, pp. 101-104, Mar. 2014.
  12. T. Mitchell, "Machine Learning," McGraw-Hill, pp. 81-126 and pp. 154-199, Mar. 1997.
  13. H. Kim, P. K. Chan, "Implicit Indicators for Interesting Web Pages," International Conference on Web Information Systems and Technologies, Miami, Florida, USA. WEBIST press, pp. 270-277, May 2005.
  14. J. Goecks, and J. W. Shavlik, "Learning Users' Interests by Unobtrusively Observing Their Normal Behavior," Proc. ACM Intelligent User Interfaces Conference (IUI), Jan. 2000.
  15. H. Seo, J. Kim, "Development of a Robot Performance System Employing a Motion Database," Journal of Korea Society of Computer and Information, Vol. 19, No. 12. pp. 21-29, Dec. 2014. https://doi.org/10.9708/jksci.2014.19.12.021
  16. Y. S. Maarek, I. Z. B. Shaul, Automatically organizing bookmarks per contents, Computer Networks and ISDN Systems, 28 (7), 1321-1333, May 1996. https://doi.org/10.1016/0169-7552(96)00024-4
  17. D. E. Hinkle, W. Wiersma, and S. G. Jurs, "Applied Statistics for the Behavioral Sciences (4th ed.)," Boston: houghton Mifflin, Jan. 1998.
  18. G. Heo and S. Kim, "A New Clustering Method for Minimum Classification Error," Journal of The Korea Society of Computer and Information, 19(7), July 2014.