Browse > Article
http://dx.doi.org/10.3745/KIPSTB.2004.11B.6.723

Web Page Classification System based upon Ontology  

Choi Jaehyuk (삼성전자 네트워크 사업부)
Seo Haesung (아주대학교 정보통신전문대학원 정보통신공학과)
Noh Sanguk (가톨릭대학교 컴퓨터정보공학부)
Choi Kyunghee (아주대학교 정보통신전문대학원)
Jung Gihyun (아주대학교 전자공학부)
Abstract
In this paper, we present an automated Web page classification system based upon ontology. As a first step, to identify the representative terms given a set of classes, we compute the product of term frequency and document frequency. Secondly, the information gain of each term prioritizes it based on the possibility of classification. We compile a pair of the terms selected and a web page classification into rules using machine learning algorithms. The compiled rules classify any Web page into categories defined on a domain ontology. In the experiments, 78 terms out of 240 terms were identified as representative features given a set of Web pages. The resulting accuracy of the classification was, on the average, 83.52%.
Keywords
Web Page Classification; Ontology; Information Gain; Machine Learning;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 P. Clark and T. Niblett, 'The CN2 Induction algorithm,' Machine Learning Journal, Vol.3, No.4, pp.261-283, 1989   DOI
2 D. R. Tveter, Backprop Package, available on-line, http://www.dontveter.com/nnsoft/bp042796.zip, 1996
3 M. P. Sinka and D. W. Corne, 'A large benchmark dataset for web document clustering,' Soft Computing Systems : Design, Management and Applications, Frontiers in Artificial Intelligence and Applications, Vol.87, pp.881-890, 2002
4 N. Soonthornphisaj, P. Chartbanchachai, T. Pratheeptham, and B. Kijsirikul, 'Web page categorization using hierarchical headings structure,' Proc. of 24th International Conference on Information Technology Interfaces, Vol.1 pp.37-42, 2002   DOI
5 A. Sun, E. Lim and W. Ng, 'Web classification using support vector machine,' WlDM'02, Virginia, November, 2002   DOI
6 S. M. Ruger and S. E. Gauch, Feature Reduction for Document Clustering and Classification, Technical report, Computing Department, Imperial College, London, 2000
7 G. Salton, and C. Buckley, 'Term weighting approaches in automatic text retrieval,' Information Processing and Management, Vol.24, No.5, pp. 513-523, 1988   DOI   ScienceOn
8 F. Sebastiani, 'Machine learning in automated text categorization,' ACM Computing Surveys, Vol.34, No.1, pp.1-47, 2002   DOI   ScienceOn
9 D. Riboni, 'Feature Selection for Web Page Classification,' EURASIA - ICT 2002 Proc. of the Workshops, Shiraz, Iran, October 2002
10 C. E. Shannon, 'A mathematical theory of communication,' Bell System Technical Journal, Vol. 27, pp. 379-423 and 623-656, July/October, 1948   DOI
11 J. Reynolds and J. Postel, 'Assigned Numbers,' STD 2, RFC 1700, October, 1994
12 J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann, 1993
13 S. Noh, H. Seo, J. Choi, K. Choi and G. Jung, 'Classifying Web Pages Using Adaptive Ontology,' Proc. of the IEEE International Conference on Systems, Man and Cybernetics, pp.2144-2149, Washington, D.C., October, 2003   DOI
14 S. Parent, B. Mobasher and S. Lytinen, 'An adaptive agent for web exploration based on concept hierarchies,' Proc. of 9th International Conference on Human Computer Interaction, New Orleans, August, 2001
15 R. Prabowo, M. Jackson, P. Burden and H. Knoell, 'Ontology-Based Automatic Classification for the WEB Pages : Design, Implementation an Evaluation,' Proc. of 3rd International Conference, Singapore, pp.182-191, 2002   DOI
16 S. Noh, C. Lee, K. Choi and G. Jung, 'Detecting Distributed Denial of Service(DDoS) Attacks Through Inductive Learning,' Lecture Notes in Computer Science 2690, pp.286-295, Springer, 2003   DOI
17 N. F. Noy and D. L. Mcguinness, 'Ontology development 101 : A guide to creating your first ontology,' Knowledge Systems Laboratory(KSL), Department of Computer Science, Stanford: Technical report, KSL-01-05, 2001
18 Y. Ng, J. Tang and M. Goodrich, 'A binary-categorization approach for classifying multiple-record Web documents using application ontologies and a probabilistic model,' Proc. of 7th International Conference on Database Systems for Advanced Applications, pp.58-65, April, 2001   DOI
19 C. Jenkins, M. Jackson, P. Burden and J. Wallis, 'Automatic RDF metadata generation for resource discovery,' Proc. of 8th International WWW Conference, Toronto, pp.11-14, May, 1999
20 Lawrence Berkeley National Labs Network Research Group, libpcap, available on-line, http://ftp.ee.lbl.gov
21 R. Hanson, J. Stutz and P. Cheeseman, Bayesian Classification Theory, Technical Report FIA-90-12-7-01, NASA Ames research Center, AI Branch, 1991
22 L. Holder, ML v2.0, Machine Learning Program Evaluator, available on-line, http://ranger.uta.edu/-holder/courses/cse6363/ml2.0.tar.gz
23 C. Ding, C. Chi, J. Deng and C. Dong, 'Centralized content-based Web filtering and blocking: how far can it go?,' Proc. of 1999 IEEE International Conference on Systems, Man and Cybernetics, Vol.2, pp.115-119, October, 1999   DOI
24 R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, ACM Press/Addison-Wesley, New York, 1999