Browse > Article
http://dx.doi.org/10.3745/KIPSTB.2005.12B.4.499

A Design and Implementation of Web Robot by Using Genre-based Categorization and Subject-based Categorization  

Lee Yong-Bae (전주교육대학교 컴퓨터교육과)
Abstract
It still has some restrictions to collect a specialized information with only the function of existing web robot which collect an enormous of data by circulating through the internet. Therefore, in this paper the functions of the current web robot and its application areas are analyzed and the limitations of collecting a specialized information are found out. Also we define what functions are necessary for a web robot in order to collect a specialized information. Then the designed structure is described. There are two critical functions which are applied to web robot. One is a genre-based categorization that classifies the text by the type, and the other is a content-based categorization by the subject. Most of all, genre-based categorization is used as fundamental feature which enables web robot to collect the aimed documents efficiently.
Keywords
Web Robot; Automatic Classification;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Yiming Yang, Jan Peterson, 'A comparative study on feature selection in text categorization', Proc. of 14th Int. Conf. On Machine Learning, 1997
2 NSTA, NSA WebWatcher Institute, http://webwatchers.nsta.org
3 Checkbot, http://degraaff.org/checkbot
4 eBookExpress, http://www.ebookexpress.com
5 Mattew Gray, mkgray@mit.edu, http://www.mit.edu:8001/people/mkgray
6 MOMSpider, http://ftp.ics.uci.edu/pub/websoft
7 Mysimon, http://www.mysimon.com
8 Synaptic, http://www.synap.com
9 WatchPrice.com, http://www.watchprice.com
10 Webcrawler, http://webcrawler.com
11 BargainFinder, http://bf.cstar.ac.com/bf
12 BookFinder.com, http://www.bookfinder.com
13 Yiming Yang, Xin Liu, 'A Re-examination of Text Categorization Methods', Proc. of the 22nd ACM SIGIR'99, 1999   DOI
14 Andrew McCallum, Kamal Nigram, 'A Comparison of Event Models for Nave Bayes Text Classification', AAAI'98 Workshop on Learning for Text Categorization, 1998
15 Eberhart, 'Survey of RDF data on the web', Proc. of the 6th World Multiconference on Systemics, Cybernetics and Informatics, 2002
16 Amazon, http://www.amazon.com
17 Hyo-Jung Oh, Sung Hyon Myaeng, Mann-Ho Lee, 'A Practical Hypertext Categorization Method using Links and Incrementally Available ClassInformation', Proc. of the 23rd ACM SIGIR Conference, Athenes, Greece, 2000
18 David Lewis, Marc Ringuette, 'A Comparison of Two Learning Algorithm for Text Categorization', Proc. of the 3rd Annual Symposium on Document Analysis and Information Retrieval, 1994
19 Mehran Sahami, 'Learning Limited Dependence Bayesian Classifiers', Proc. of the 2nd International Conference on KDD'96, 1996
20 Yong-Bae Lee, Sung Hyon Myaeng, 'Automatic Identification of Text Genres and Their Roles in Subject-Based Categorization', Proceedings of HlCSS-37, Jan., Hawaii, 2004   DOI
21 Andrew Dillon, Barbara Gushrowski, 'Genre and the Web: Is the Personal Home Page the First Uniquely Digital Genre?', JASIS, 51(2), 2000   DOI   ScienceOn
22 Jussi Karlgren, Douglass Cutting, 'Recognizing Text Genres with Simple Metrics Using Discriminant Analysis', Proc. of COLING94, Kyoto, 1994   DOI
23 한국인터넷정보센터, URN 체계활용을 위한 메타데이터 개발, 기술보고서, 2002
24 염기종, 권영식, 'Suffix Tree를 이용한 웹문서 클러스터의 제목 생성 방법 성능 비교' 한국데이타마이닝학회 2002 추계학술대회 논문집,2002
25 Tim Bemers-Lee, James Hendler, Ora Lassila, 'The Semantic Web', Scientific American, 5, 2001
26 W3C, Resource Description Framework (RDF) http://www.w3.org/RDF/, 2003
27 이근배 외, '에이전트 기반 정보검색' 정보과학회지, 제16권 제8호, 1998   과학기술학회마을
28 마이크로소프트, 웹 로봇과 정보 추적자, 에이전트 기술/정보 찾아 3만리, 로봇 에이전트, 월간 마이크로소프트 10월, 1996
29 남기범, 이건명, '전자상거래 에이전트' 정보과학회지, 제18권 제5호, 2000   과학기술학회마을