[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.5391/JKIIS.2007.17.6.849

A Method of Efficient Web Crawling Using URL Pattern Scripts

Chang, Moon-Soo (서경대학교 소프트웨어학과)
Jung, June-Young (서경대학교 소프트웨어학과)

Publication Information

Journal of the Korean Institute of Intelligent Systems / v.17, no.6, 2007 , pp. 849-854 More about this Journal

Abstract

It is difficult that we collect only target documents from the Innumerable Web documents. One of solution to the problem is that we select target documents on the Web site which services many documents of target domain. In this paper, we will propose an intelligent crawling method collecting needed documents based on URL pattern script defined by XML. Proposed crawling method will efficiently apply to the sites which service structuralized information of a piece with database. In this paper, we collected 50 thousand Web documents using our crawling method.

Keywords

Web Crawling; URL; Pattern Script; URL Filtering;

Citations & Related Records

Times Cited By KSCI : 1 (Citation Analysis)

Reference
Cited By KSCI

1	Tim Bemers-Lee, 'Enabling Standards & Technologies,' (http://www.w3.org/2002/Talks/04-sweb/slide12-0.html)
2	J. Cho, 'Efficient Crawling through URL ordering,' Computer Networks and ISDN Systems, Vol.30, pp. 161-172, 1998 DOI ScienceOn
3	장문수, 강선미, '도메인지식의 계층화를 통한 온톨로지 인스턴스의 속성정보 추출', 퍼지및지능시스템학회 논문지, 17권 3호, pp. 291-296, 2007.6 과학기술학회마을 DOI
4	'The Web Robots FAQ', http://www.robotstxt.org/faq.html
5	김성진, 이상호, '웹 로봇 구현 및 한국 웹 통계보고,' 한국정보처리학회논문지, 제10권, 4호, pp. 509-518. 2003
6	강문수, 최영식, '대용량 분산 웹 크롤러', 한국인터넷정보학회 학술발표대회 논문집, 제6권 1호, pp. 185-188, 2005

KSCI

A Method of Efficient Web Crawling Using URL Pattern Scripts URL 패턴 스크립트를 이용한 효율적인 웹문서 수집 방안

A Method of Efficient Web Crawling Using URL Pattern Scripts