Browse > Article

A Hypertext Categorization Method using Incrementally Computable Class Link Information  

Oh, Hyo-Jung (한국전자통신연구원 휴먼정보처리부)
Myaeng, Sung-Hyoun (충남대학교 컴퓨터학과)
Abstract
As WWW grows at an increasing speed, a classifier targeted at hypertext has become in high demand. While document categorization il quite mature, the issue of utilizing hypertext structure and hyperlinks has been relatively unexplored. In this paper, we propose a practical method for enhancing both the speed and the quality of hypertext categorization using hyerlinks. In comparison against a recently proposed technique that appears to be the only one of the kind, we obtained up to 18.5% of improvement in effectiveness while reducing the processing time dramatically. We attempt to explain through experiments what factors contribute to tile improvement.
Keywords
Automatic document categorization; Hypertext categorization; Hyperlink learning;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Kleinberg, J., 'Authoritative Source in a Hyperlinked Environment,' Proc. of the 9th annual international ACM-SIAM '98, 1998
2 Leah S. Larkey and W. Bruce Croft, 'Combining Classifiers in Text Categorization,' Proc. of the 19th annual international ACM-SIGIR 96, 1996   DOI
3 Mart A. Hearst, 'Support Vector Machines,' IEEE Information Systems, 13(4): 18-28, 1998
4 J. M. Lim, H. J. Oh, S. H. Myaeng, M. H. Lee, 'Improving Efficiency with Document Category Information in Link-based Retrieval,' Proc. of the International Workshop on IRAL'99, 1999
5 Won-Kyun Joo and Sung-Hyoun Myaeng, 'Improving Retrieval Effectiveness with Link Information,' Proc. of the International Workshop on IRAL'98, 1998
6 이호, 단어 의미 중의성 해결을 위한 분류 정보 모형, 고려대학교 박사학위 논문, 1999
7 Susan Dumais, John Platt, David Heckerman, and Mehran Sahami, 'Inductive Learning Algorithms and Representations for Text Categorization,' Proc. of the 7th international Conference on CIKM '98, 1998   DOI
8 정성화, 이종혁, '문서 구조 정보에 기반한 웹 페이지 범주화 모델', 제 10회 한글 및 한국어 정보처리학술 대회, 1998
9 Susan Dumais and Hao Chen, 'Hierarchical Classification of Web Content,' Proc. of the 23th annual international ACM-SIGIR, July 2000   DOI
10 Soumen Chakrabarti, Byron Dom, and Piotr Indyk, 'Enhanced Hypertext Categorization using Hyperlinks,' Proc. of the International Conference on SIGMOD '98, 1998   DOI
11 장동현, 맹성현, '효율적인 색인어 추출을 위한 복합명사 분석방법', 제8회 한글 및 한국어 정보처리학술대회, 1996
12 Keiichiro Hoashi, Kazunori Matsumoto, Naomi Inoue, and Kazuo Hashimoto, 'Document Filtering Method Using Non-Relevant Information Profile,' Proc. of the 23th annual international ACM-SIGIR, July 2001   DOI
13 Yu-Hwan Kim, Shang-Yoon Hahn, and Byoung-Tak Zhang, 'Text filtering by boosting naive Bayes Classifiers,' Proc. of the 23th annual international ACM-SIGIR, July 2000   DOI
14 David D. Lewis, Representation and Learing in Information Retrieval, Ph.D thesis, Dep. of Computer Science, Univ. of Massachusetts, 1992
15 P. J. Hayes, P. M. Andersen, I. B. Niernburg, and L. M. Schmandt, 'TCS: A Shell for Content-Based Text Categorization,' Proc. of the 6th IEEE-CAIA '90, 1990
16 Mark Craven, Dan Di Pasquo, Dayne Freitag, Andrew McCallum, Tom Mitchell, Kamal Nigam, and Sean Slattery, 'Learing to Extract Knowledge from the World Wide Web,' Proc. of the International Workshop on AAAI '98, 1998
17 L. Pelkowitz, 'A Continuous Relaxation Labeling Algorithm for Markov Random Fields,' IEEE Trans, on Systems, Man and Cybernetics, 20(3): 705-715, 1990   DOI   ScienceOn
18 Yiming Yang and Xin Liu, 'A Re-examination Of Text Categorization Methods,' Proc. the of 22th annual international ACM-SIGIR, 1999   DOI
19 Chidanand Apt, Fred Damerau, and Sholom M. Weis, 'Towards Language Independent Automated Learning of Text Categorization models,' Proc. of the 17th annual international ACM-SIGIR, 1994
20 조은일, 임정묵, 오효정, 이만호, 맹성현, 'CORBA와 JAVA를 사용한 에이전트 기반 디지털 도서관 프로토 타입 구현', 한국정보과학회 춘계 학술대회, 1999
21 L. Douglas Baker and Andrew K. McCallum, 'Distributional Clustering of Words for Text Classification,' Proc. of the 21st annual international ACM-SIGIR, 1998   DOI
22 David D. Lewis and Marc. Ringuette, 'A Comparison of Two Learning Algorithms for Text Categorization,' Proc. of the 3rd Annual Symposium on Document Analysis and information Retrieval, 1994
23 R. E. Shapire, Yoram Singhal, and Amit Singhal, 'Boosting and Rocchio applied to text filtering,' Proc. of the 21th annual international ACM-SIGIR, 1998   DOI