User Profile Generation using Visual Differences of HTML Document

HTML 문서의 시각적 분석을 이용한 사용자 프로파일 생성

  • 곽주현 (건국대학교 대학원 컴퓨터정보통신공학과) ;
  • 이창훈 (건국대학교 컴퓨터정보통신공학과)
  • Published : 2000.06.01

Abstract

In this study, I've suggested how to improve the function of web-agents to find out the web-document users prefer. Web-agents employ TFIDF, which considers all the worked used in a document as equal in improtance to find out users' preferences. Web-documents like HTML, however, make visual differences by using different sizes of letters and highlighting them based on importance of words. In this study, I've attempted to improve the functions of the web-agents by differentiating the weight of each worked in accordance with the visual importance of each paragraph. To enhance functions, I've suggested how to make a profile from each paragraph to be consolidated later. As to suggested algorithms, I've tested their effects by comparing the established TFIDF algorithm with the function which helps users find documents they prefer.

Keywords

References

  1. Ju-hyun kwak, chang-hoon lee, 'Advanced User Profile Agent Using Structure Analysis of HTML Document,' .책임연구, Proc. pp.319-323 IC-AI99, CSREA Press, 1999
  2. 박영식, 곽주현, 이창훈, '프로파일 생성을 위한 TFIDF 개선방안 연구', 정보처리학회 추계 학술대회, 1998
  3. Gerard Salton and Michael J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, 1983
  4. Salton. G., and Buckley, 'Term weighting approaches in automatic text retrieval,' Technical Report 87-881, Cornell University. Department of Computer Science 1987
  5. Gerard Salton and Chris Buckly, 'Improving Retrieval Performance by Relevance Feedback,' 1990.
  6. William B. Fakes. Ricardo Baeza-Yates, 'Information Retrieval. Data Structures & Algorithms.' ch.7, ch.8, Prentice-Hall, 1992
  7. Marko Balabanovic and Yoav Shoham, 'Learning Information Retrieval Agents . Expenments with Automated Web Browsing,' AAAI Spring Symposium on Information Gathering, Stanford